Skip to content

Basic ramtorch support#1272

Draft
maedtb wants to merge 2 commits intoNerogar:masterfrom
maedtb:ramtorch
Draft

Basic ramtorch support#1272
maedtb wants to merge 2 commits intoNerogar:masterfrom
maedtb:ramtorch

Conversation

@maedtb
Copy link
Contributor

@maedtb maedtb commented Feb 1, 2026

Proof of concept, probably needs revised.

@dxqb
Copy link
Collaborator

dxqb commented Feb 5, 2026

Thanks for the PR! Makes it much easier to try.

I know this is only a draft for experimentation. I'll leave some comments here anyway for future reference:

  • only works for finetuning, not for LoRA. the LoRA adapter doesn't find the linear layers anymore, because they have been replaced with a layer that is not nn.Linear
  • incompatible with quantization
  • incompatible with torch.compile:
torch._dynamo.exc.Unsupported: HigherOrderOperator: Mutating a variable not in the current scope (SideEffects)
  File ".../venv/lib/python3.10/site-packages/ramtorch/modules/linear.py", line 166, in forward
    state["forward_clk"] ^= 1
  • incompatible with stochastic rounding (device mismatch)
  • as Nerogar has noted, the existing OneTrainer functionality for replace linear layers should be used, not the RamTorch helper function

Training speed, RTX 4070 16 GB and Flux1, 512 px bs 1:

  • 8.3 s/it with RamTorch
  • 3.8 s/it with OneTrainer offloading fraction 0.5

RamTorch wasn't developed for small batch sizes I guess, but this is what many OneTrainer users do so it's currently not an alternative. It might be possible to improve it in that direction though.

@maedtb
Copy link
Contributor Author

maedtb commented Feb 5, 2026

I also saw that torch.compile doesn't work, but I'm not sure if that is actually a limitation of ramtorch, or if that's just an issue with this naive implementation.

I hadn't noticed the stochastic_rounding issue, as I have been running MUON_ADV with stochastic_rounding enabled successfully, but I did run into an issue where resuming from a backup has a tensor device mismatch in the MUON_ADV optimizer, which could be related to stochastic_rounding. I haven't dug into that yet with everything else I'm working on, but it's safe to assume that there are several problems that are hard-blockers on this being considered.

@dxqb dxqb added the wontfix This will not be worked on label Feb 6, 2026
@dxqb
Copy link
Collaborator

dxqb commented Feb 6, 2026

but it's safe to assume that there are several problems that are hard-blockers on this being considered.

if it provided good performance across the board, all of these could be solved with some work I guess.
But since RamTorch was designed for pre-training with high batch sizes, it performs much worse than our offloading at low batch sizes currently, so I don't think it's worth putting in that work currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants