【训练营】support flash attention by tangguochuan · Pull Request #118 · InfiniTensor/InfiniTrain

tangguochuan · 2026-03-16T01:40:09Z

No description provided.

Upstream changes (791c75e..b1e4b03): - feat: organize test cases by test_groups structure - fix: add retry logic, compare utils, cleanup in scripts/ Conflict resolution for scripts/test_config.json: - adopted upstream's test_groups structure - retained our bfloat16+flash test entries (1_bfloat16_flash, 2_bfloat16_flash) Backup branch: backup/before-upstream-merge Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete my-flash-attention/ (kernels already merged into flash_attention.cu) - Clean up corresponding .gitignore entries - Fix flash_test_config.json: migrate to test_groups structure (upstream compat) - Fix flash_attention_report.md: update run command to --test-config, log paths, and refresh all experiment data with actual measured values - Add logs/flash/ with 8 training logs (30 steps each, seq128/512 × flash/no-flash) - Update report_figures with freshly generated charts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add flash_large_seq_test_config.json: seq1024 batch=2 for GPT2+LLaMA3 - Add logs/flash_large/: 8 experiment logs (30 steps each) - GPT2 seq1024: flash 1.21x speedup, 19.6% memory saving (best result) - LLaMA3 seq1024: flash 1.03x speedup, 10.9% memory saving - LLaMA3 seq2048/4096 batch=1: flash ~1.00x (GEMM-dominated, not attention-bound) - Add plot_flash_large_report.py and report_figures/large_seq/ (6 charts) - Update flash_attention_report.md: add section 1.4 with large-seq results, add section 2.5 with large-seq reproduction commands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kilinchange · 2026-03-16T06:09:09Z

请移除 pr 中不必要的提交，pr 中只需包含代码部分修改，项目报告相关内容请作为邮件附件发送。

kilinchange · 2026-03-17T06:24:17Z

请移除重复 pr，仅保留一个有效 pr，已有 pr ：#119

tangguochuan and others added 18 commits March 10, 2026 11:01

inject flash attention into framework with nan error in gpt2 training

4b3f709

modify CLAUDE.md

cb7bbbe

finish bp but without running

16d72fd

fix memory error

bfde8f6

before fix

478322d

modify again

e3d9e0a

modify a little

5a21962

dv is ok now and add test file

b43e147

dq is correct

7e56ceb

dq,dk,dv is all right

f320709

update gitignore

d07d675

finish all

7127054

update ignore

5e37581

add report

7ec5942

merge attention

e699eaa

kilinchange self-requested a review March 16, 2026 06:06

kilinchange changed the title ~~support flash attention~~ 【训练营】support flash attention Mar 16, 2026

kilinchange self-assigned this Mar 17, 2026

tangguochuan closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【训练营】support flash attention#118

【训练营】support flash attention#118
tangguochuan wants to merge 18 commits intoInfiniTensor:masterfrom
tangguochuan:master

tangguochuan commented Mar 16, 2026

Uh oh!

kilinchange commented Mar 16, 2026 •

edited

Loading

Uh oh!

kilinchange commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangguochuan commented Mar 16, 2026

Uh oh!

kilinchange commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kilinchange commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilinchange commented Mar 16, 2026 •

edited

Loading