feat: extract the common module of Transformer#115
Open
JYMiracle305 wants to merge 1 commit intomasterfrom
Open
feat: extract the common module of Transformer#115JYMiracle305 wants to merge 1 commit intomasterfrom
JYMiracle305 wants to merge 1 commit intomasterfrom
Conversation
2ab6ca5 to
e0504d9
Compare
Contributor
Author
4281ea9 to
dfdd913
Compare
dfdd913 to
d833ec2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




本次PR主要内容为抽象出Transformer类模型的构建架构,将GPT2和LLaMA3构建过程统一为一个流程实现。
目录结构
…/core/
├── models/decode_only_transformer/
│ ├── layer_specs.h/.cc # 模型构建函数声明与实现
│ └── model.h # RMSNorm/NewGELU/SwiGLU 等组件声明
└── transformer/
├── spec_utils.h/.cc # ModuleSpec 构建工具函数与模块注册宏
├── transformer_block.h/.cc # TransformerBlock 等基础组件,注册实现
├── transformer_builders.h/.cc # 规格构建器声明与实现 (BuildNormSpec, BuildMLPSpec 等)
├── transformer_config.h # TransformerConfig 配置结构体,替代原GPT2Config和LLaMA3Config
└── transformer_layer.h/.cc # TransformerFirstStage/Chunk/LastStage,替代原GPT2FirstStage/Chunk/LastStage,LLaMA3FirstStage/Chunk/LastStage
核心机制
ModuleSpec数据结构用于声明模块的类型和参数,模块具体实现通过 ModuleRegistry 统一注册,在构建模型时通过build_module() 动态实例化,根据spec关联已注册的实现。