Skip to content

【训练营】support flash attention#118

Closed
tangguochuan wants to merge 18 commits intoInfiniTensor:masterfrom
tangguochuan:master
Closed

【训练营】support flash attention#118
tangguochuan wants to merge 18 commits intoInfiniTensor:masterfrom
tangguochuan:master

Conversation

@tangguochuan
Copy link

No description provided.

tangguochuan and others added 18 commits March 10, 2026 11:01
Upstream changes (791c75e..b1e4b03):
- feat: organize test cases by test_groups structure
- fix: add retry logic, compare utils, cleanup in scripts/

Conflict resolution for scripts/test_config.json:
- adopted upstream's test_groups structure
- retained our bfloat16+flash test entries (1_bfloat16_flash, 2_bfloat16_flash)

Backup branch: backup/before-upstream-merge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete my-flash-attention/ (kernels already merged into flash_attention.cu)
- Clean up corresponding .gitignore entries
- Fix flash_test_config.json: migrate to test_groups structure (upstream compat)
- Fix flash_attention_report.md: update run command to --test-config, log paths,
  and refresh all experiment data with actual measured values
- Add logs/flash/ with 8 training logs (30 steps each, seq128/512 × flash/no-flash)
- Update report_figures with freshly generated charts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add flash_large_seq_test_config.json: seq1024 batch=2 for GPT2+LLaMA3
- Add logs/flash_large/: 8 experiment logs (30 steps each)
  - GPT2 seq1024: flash 1.21x speedup, 19.6% memory saving (best result)
  - LLaMA3 seq1024: flash 1.03x speedup, 10.9% memory saving
  - LLaMA3 seq2048/4096 batch=1: flash ~1.00x (GEMM-dominated, not attention-bound)
- Add plot_flash_large_report.py and report_figures/large_seq/ (6 charts)
- Update flash_attention_report.md: add section 1.4 with large-seq results,
  add section 2.5 with large-seq reproduction commands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kilinchange kilinchange self-requested a review March 16, 2026 06:06
@kilinchange
Copy link
Collaborator

kilinchange commented Mar 16, 2026

请移除 pr 中不必要的提交,pr 中只需包含代码部分修改,项目报告相关内容请作为邮件附件发送。

@kilinchange kilinchange changed the title support flash attention 【训练营】support flash attention Mar 16, 2026
@kilinchange
Copy link
Collaborator

请移除重复 pr,仅保留一个有效 pr,已有 pr :#119

@kilinchange kilinchange self-assigned this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants