【训练营】feat: integrate Flash Attention operator into InfiniTrain framework#119
Open
tangguochuan wants to merge 2 commits intoInfiniTensor:masterfrom
Open
【训练营】feat: integrate Flash Attention operator into InfiniTrain framework#119tangguochuan wants to merge 2 commits intoInfiniTensor:masterfrom
tangguochuan wants to merge 2 commits intoInfiniTensor:masterfrom
Conversation
Add a self-contained Flash Attention forward/backward implementation
(BLOCK_Q=64, BLOCK_KV=64, sm_80+, bf16 only) and wire it into the
autograd/dispatcher system.
Key changes:
- infini_train/include/autograd/flash_attention.h: FlashAttention Function
- infini_train/src/autograd/flash_attention.cc: Forward/Backward with
saved tensors {Q,K,V,O,L}; L (logsumexp) passed through SetupContext
- infini_train/src/kernels/cuda/flash_attention.cu: self-contained CUDA
kernel (inlines tiling logic, MMA m16n8k16, online softmax); GQA
supported (q_head != kv_head); must use framework NonBlocking stream
- CMakeLists.txt: build flash_attention.cu as separate sm_80;90 target
(infini_train_flash_attention) to avoid sm_75 compile failure
- example/gpt2, example/llama3: add --flash flag to switch attention path
Constraints: dtype=bfloat16 only, head_dim=64 only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
请解决当前 pr 与 master 的冲突。 |
Author
|
我关闭了pr118. 我解决了当前pr冲突,因此当前pr产生了新的commit, 需要再开一个分支提交干净的pr吗 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a self-contained Flash Attention forward/backward implementation (BLOCK_Q=64, BLOCK_KV=64, sm_80+, bf16 only) and wire it into the autograd/dispatcher system.
Key changes:
Constraints: dtype=bfloat16 only, head_dim=64 only.