Basic ramtorch support by maedtb · Pull Request #1272 · Nerogar/OneTrainer

maedtb · 2026-02-01T17:24:01Z

Proof of concept, probably needs revised.

dxqb · 2026-02-05T11:21:22Z

Thanks for the PR! Makes it much easier to try.

I know this is only a draft for experimentation. I'll leave some comments here anyway for future reference:

only works for finetuning, not for LoRA. the LoRA adapter doesn't find the linear layers anymore, because they have been replaced with a layer that is not nn.Linear
incompatible with quantization
incompatible with torch.compile:

torch._dynamo.exc.Unsupported: HigherOrderOperator: Mutating a variable not in the current scope (SideEffects)
  File ".../venv/lib/python3.10/site-packages/ramtorch/modules/linear.py", line 166, in forward
    state["forward_clk"] ^= 1

incompatible with stochastic rounding (device mismatch)
as Nerogar has noted, the existing OneTrainer functionality for replace linear layers should be used, not the RamTorch helper function

Training speed, RTX 4070 16 GB and Flux1, 512 px bs 1:

8.3 s/it with RamTorch
3.8 s/it with OneTrainer offloading fraction 0.5

RamTorch wasn't developed for small batch sizes I guess, but this is what many OneTrainer users do so it's currently not an alternative. It might be possible to improve it in that direction though.

maedtb · 2026-02-05T17:30:06Z

I also saw that torch.compile doesn't work, but I'm not sure if that is actually a limitation of ramtorch, or if that's just an issue with this naive implementation.

I hadn't noticed the stochastic_rounding issue, as I have been running MUON_ADV with stochastic_rounding enabled successfully, but I did run into an issue where resuming from a backup has a tensor device mismatch in the MUON_ADV optimizer, which could be related to stochastic_rounding. I haven't dug into that yet with everything else I'm working on, but it's safe to assume that there are several problems that are hard-blockers on this being considered.

dxqb · 2026-02-06T09:17:46Z

but it's safe to assume that there are several problems that are hard-blockers on this being considered.

if it provided good performance across the board, all of these could be solved with some work I guess.
But since RamTorch was designed for pre-training with high batch sizes, it performs much worse than our offloading at low batch sizes currently, so I don't think it's worth putting in that work currently.

maedtb added 2 commits February 1, 2026 09:49

Add basic ramtorch offloading-to-RAM support

d2477df

Revert accidental change to requirements-global.txt

5f92bb6

maedtb force-pushed the ramtorch branch from 6dd93bc to 5f92bb6 Compare February 1, 2026 18:06

dxqb added the wontfix This will not be worked on label Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Basic ramtorch support#1272

Basic ramtorch support#1272
maedtb wants to merge 2 commits intoNerogar:masterfrom
maedtb:ramtorch

maedtb commented Feb 1, 2026 •

edited

Loading

Uh oh!

dxqb commented Feb 5, 2026 •

edited

Loading

Uh oh!

maedtb commented Feb 5, 2026

Uh oh!

dxqb commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

maedtb commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maedtb commented Feb 5, 2026

Uh oh!

dxqb commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maedtb commented Feb 1, 2026 •

edited

Loading

dxqb commented Feb 5, 2026 •

edited

Loading

dxqb commented Feb 6, 2026 •

edited

Loading