Skip to content

feat: extract the common module of Transformer#115

Open
JYMiracle305 wants to merge 1 commit intomasterfrom
feat/transformer
Open

feat: extract the common module of Transformer#115
JYMiracle305 wants to merge 1 commit intomasterfrom
feat/transformer

Conversation

@JYMiracle305
Copy link
Contributor

@JYMiracle305 JYMiracle305 commented Mar 13, 2026

本次PR主要内容为抽象出Transformer类模型的构建架构,将GPT2和LLaMA3构建过程统一为一个流程实现。

  1. 目录结构
    …/core/
    ├── models/decode_only_transformer/
    │ ├── layer_specs.h/.cc # 模型构建函数声明与实现
    │ └── model.h # RMSNorm/NewGELU/SwiGLU 等组件声明
    └── transformer/
    ├── spec_utils.h/.cc # ModuleSpec 构建工具函数与模块注册宏
    ├── transformer_block.h/.cc # TransformerBlock 等基础组件,注册实现
    ├── transformer_builders.h/.cc # 规格构建器声明与实现 (BuildNormSpec, BuildMLPSpec 等)
    ├── transformer_config.h # TransformerConfig 配置结构体,替代原GPT2Config和LLaMA3Config
    └── transformer_layer.h/.cc # TransformerFirstStage/Chunk/LastStage,替代原GPT2FirstStage/Chunk/LastStage,LLaMA3FirstStage/Chunk/LastStage

  2. 核心机制
    ModuleSpec数据结构用于声明模块的类型和参数,模块具体实现通过 ModuleRegistry 统一注册,在构建模型时通过build_module() 动态实例化,根据spec关联已注册的实现。

@JYMiracle305
Copy link
Contributor Author

JYMiracle305 commented Mar 16, 2026

单机多卡:
GPT2:
image

LLaMA3:
image

多机训练结果:
GPT2:
image

LLaMA3:
image

@JYMiracle305 JYMiracle305 requested review from Chamberlain0w0, chen2021673 and kilinchange and removed request for Chamberlain0w0 March 16, 2026 05:42
@JYMiracle305 JYMiracle305 force-pushed the feat/transformer branch 2 times, most recently from 4281ea9 to dfdd913 Compare March 16, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant