Orienting Latent Actions for Video World Modeling

Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, and Mike Zheng Shou

📢 Updates

[04/2026] LAM training and inference code, plus pretrained weights, are released.
[02/2026] Repo and project page initialized.

🛠️ Setup

conda create -n olaf-world python=3.10 -y
conda activate olaf-world

# Install PyTorch matching your CUDA version, then:
pip install torch torchvision torchaudio
pip install -r requirements.txt

🕹️ Quick Start

Zero-shot action transfer: a pretrained LAM and world model transfer motion (as latent action sequences) from a reference video onto a target first frame — no finetuning required. See docs/quickstart.md for the full guide.

# 1. Download checkpoints
hf download YuxinJ/Olaf-World --local-dir checkpoints
hf download Skywork/SkyReels-V2-I2V-1.3B-540P \
    --local-dir checkpoints/SkyReels-V2-I2V-1.3B-540P

# 2. Run
python world_model/inference/action_transfer.py \
    --checkpoint_path   checkpoints/world_model/pretrain/model.pt \
    --lam_ckpt          checkpoints/lam/lam_vjepa_align.ckpt \
    --lam_variant       align \
    --reference_video   assets/ref_videos/0.mp4 \
    --first_frame_image assets/images/0.png \
    --output_folder     outputs/action_transfer \
    --use_ema --save_side_by_side

🏋️ Training

0️⃣ Dataset

The current release is trained on MiraData (3D Rendering and City Walking categories) for both LAM and world-model pretraining, and on MIND for world-model finetuning and evaluation.

1️⃣ LAM Training w SeqΔ-REPA

2️⃣ Olaf-World Pretraining

3️⃣ Olaf-World Finetuning

4️⃣ Self-Forcing Distillation

📌 TODO

Release world model pretraining and finetuning pipeline.
Release evaluation code.
Release distillation pipeline.

📖 Citation

If you find this work useful, please cite:

@article{jiang2026olaf,
  title={Olaf-World: Orienting Latent Actions for Video World Modeling},
  author={Jiang, Yuxin and Gu, Yuchao and Tsang, Ivor W and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2602.10104},
  year={2026}
}

⭐ Acknowledgements

Olaf-World builds on several excellent open-source projects. Many thanks to AdaWorld and Self-Forcing, and to the pretrained visual foundation models whose checkpoints and code we rely on: V-JEPA 2, VideoMAEv2, and SkyReels-V2.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
docs		docs
lam		lam
scripts		scripts
world_model		world_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orienting Latent Actions for Video World Modeling

📢 Updates

🛠️ Setup

🕹️ Quick Start

🏋️ Training

0️⃣ Dataset

1️⃣ LAM Training w SeqΔ-REPA

2️⃣ Olaf-World Pretraining

3️⃣ Olaf-World Finetuning

4️⃣ Self-Forcing Distillation

📌 TODO

📖 Citation

⭐ Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Orienting Latent Actions for Video World Modeling

📢 Updates

🛠️ Setup

🕹️ Quick Start

🏋️ Training

0️⃣ Dataset

1️⃣ LAM Training w SeqΔ-REPA

2️⃣ Olaf-World Pretraining

3️⃣ Olaf-World Finetuning

4️⃣ Self-Forcing Distillation

📌 TODO

📖 Citation

⭐ Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages