[CUDA] Heuristics for Hopper QMM by zcbenz · Pull Request #3173 · ml-explore/mlx

zcbenz · 2026-02-26T07:30:46Z

Add simple heuristics for QMM by dispatching to different tile shapes depending on the M. This does not change anything for small M problems and intends to fix bad performance for large M.

M	N	K	CUTE (TFLOP/s)	CUBLAS (TFLOP/s)	Speedup (x)
16	16384	16384	96.8	45.4	2.13
32	16384	16384	177.4	84.4	2.10
64	16384	16384	302.8	170.5	1.78
128	16384	16384	537.5	339.4	1.58

Also I realized that it is not possible to implement fp quantizations with the cutlass kernel because minimum group size is 64, so I simplified the checks in qmm_sm90.

angeloskath

🙏

angeloskath approved these changes Feb 26, 2026

View reviewed changes

zcbenz force-pushed the qmm-sm90-dispatch branch 6 times, most recently from 1f49732 to 9c7b35f Compare February 26, 2026 23:18

[CUDA] Heuristics for Hopper QMM

1675962

zcbenz force-pushed the qmm-sm90-dispatch branch from 9c7b35f to 1675962 Compare February 26, 2026 23:36

zcbenz merged commit 0c8107c into ml-explore:main Feb 27, 2026
16 checks passed

zcbenz deleted the qmm-sm90-dispatch branch February 27, 2026 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Heuristics for Hopper QMM#3173

[CUDA] Heuristics for Hopper QMM#3173
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmm-sm90-dispatch

zcbenz commented Feb 26, 2026 •

edited

Loading

Uh oh!

angeloskath left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zcbenz commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zcbenz commented Feb 26, 2026 •

edited

Loading