【训练营】support flash attention#118
Closed
tangguochuan wants to merge 18 commits intoInfiniTensor:masterfrom
Closed
Conversation
Upstream changes (791c75e..b1e4b03): - feat: organize test cases by test_groups structure - fix: add retry logic, compare utils, cleanup in scripts/ Conflict resolution for scripts/test_config.json: - adopted upstream's test_groups structure - retained our bfloat16+flash test entries (1_bfloat16_flash, 2_bfloat16_flash) Backup branch: backup/before-upstream-merge Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete my-flash-attention/ (kernels already merged into flash_attention.cu) - Clean up corresponding .gitignore entries - Fix flash_test_config.json: migrate to test_groups structure (upstream compat) - Fix flash_attention_report.md: update run command to --test-config, log paths, and refresh all experiment data with actual measured values - Add logs/flash/ with 8 training logs (30 steps each, seq128/512 × flash/no-flash) - Update report_figures with freshly generated charts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add flash_large_seq_test_config.json: seq1024 batch=2 for GPT2+LLaMA3 - Add logs/flash_large/: 8 experiment logs (30 steps each) - GPT2 seq1024: flash 1.21x speedup, 19.6% memory saving (best result) - LLaMA3 seq1024: flash 1.03x speedup, 10.9% memory saving - LLaMA3 seq2048/4096 batch=1: flash ~1.00x (GEMM-dominated, not attention-bound) - Add plot_flash_large_report.py and report_figures/large_seq/ (6 charts) - Update flash_attention_report.md: add section 1.4 with large-seq results, add section 2.5 with large-seq reproduction commands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
请移除 pr 中不必要的提交,pr 中只需包含代码部分修改,项目报告相关内容请作为邮件附件发送。 |
Collaborator
|
请移除重复 pr,仅保留一个有效 pr,已有 pr :#119 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.