【训练营】feat: integrate Flash Attention operator into InfiniTrain framework by tangguochuan · Pull Request #119 · InfiniTensor/InfiniTrain

tangguochuan · 2026-03-16T07:06:42Z

Add a self-contained Flash Attention forward/backward implementation (BLOCK_Q=64, BLOCK_KV=64, sm_80+, bf16 only) and wire it into the autograd/dispatcher system.

Key changes:

infini_train/include/autograd/flash_attention.h: FlashAttention Function
infini_train/src/autograd/flash_attention.cc: Forward/Backward with saved tensors {Q,K,V,O,L}; L (logsumexp) passed through SetupContext
infini_train/src/kernels/cuda/flash_attention.cu: self-contained CUDA kernel (inlines tiling logic, MMA m16n8k16, online softmax); GQA supported (q_head != kv_head); must use framework NonBlocking stream
CMakeLists.txt: build flash_attention.cu as separate sm_80;90 target (infini_train_flash_attention) to avoid sm_75 compile failure
example/gpt2, example/llama3: add --flash flag to switch attention path

Constraints: dtype=bfloat16 only, head_dim=64 only.

Add a self-contained Flash Attention forward/backward implementation (BLOCK_Q=64, BLOCK_KV=64, sm_80+, bf16 only) and wire it into the autograd/dispatcher system. Key changes: - infini_train/include/autograd/flash_attention.h: FlashAttention Function - infini_train/src/autograd/flash_attention.cc: Forward/Backward with saved tensors {Q,K,V,O,L}; L (logsumexp) passed through SetupContext - infini_train/src/kernels/cuda/flash_attention.cu: self-contained CUDA kernel (inlines tiling logic, MMA m16n8k16, online softmax); GQA supported (q_head != kv_head); must use framework NonBlocking stream - CMakeLists.txt: build flash_attention.cu as separate sm_80;90 target (infini_train_flash_attention) to avoid sm_75 compile failure - example/gpt2, example/llama3: add --flash flag to switch attention path Constraints: dtype=bfloat16 only, head_dim=64 only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kilinchange · 2026-03-17T06:23:12Z

请解决当前 pr 与 master 的冲突。

tangguochuan · 2026-03-17T06:57:11Z

我关闭了pr118. 我解决了当前pr冲突，因此当前pr产生了新的commit, 需要再开一个分支提交干净的pr吗

kilinchange changed the title ~~feat: integrate Flash Attention operator into InfiniTrain framework~~ 【训练营】feat: integrate Flash Attention operator into InfiniTrain framework Mar 16, 2026

kilinchange self-requested a review March 16, 2026 07:08

kilinchange self-assigned this Mar 17, 2026

kilinchange mentioned this pull request Mar 17, 2026

【训练营】support flash attention #118

Closed

Merge branch 'master' into feat/flash-attention

168fe38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【训练营】feat: integrate Flash Attention operator into InfiniTrain framework#119

【训练营】feat: integrate Flash Attention operator into InfiniTrain framework#119
tangguochuan wants to merge 2 commits intoInfiniTensor:masterfrom
tangguochuan:feat/flash-attention

tangguochuan commented Mar 16, 2026

Uh oh!

kilinchange commented Mar 17, 2026

Uh oh!

tangguochuan commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangguochuan commented Mar 16, 2026

Uh oh!

kilinchange commented Mar 17, 2026

Uh oh!

tangguochuan commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants