Skip to content

showlab/Olaf-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Orienting Latent Actions for Video World Modeling

Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, and Mike Zheng Shou

arXiv Project Page Code HuggingFace

Teaser


📢 Updates

  • [04/2026] LAM training and inference code, plus pretrained weights, are released.
  • [02/2026] Repo and project page initialized.

🛠️ Setup

conda create -n olaf-world python=3.10 -y
conda activate olaf-world

# Install PyTorch matching your CUDA version, then:
pip install torch torchvision torchaudio
pip install -r requirements.txt

🕹️ Quick Start

Zero-shot action transfer: a pretrained LAM and world model transfer motion (as latent action sequences) from a reference video onto a target first frame — no finetuning required. See docs/quickstart.md for the full guide.

# 1. Download checkpoints
hf download YuxinJ/Olaf-World --local-dir checkpoints
hf download Skywork/SkyReels-V2-I2V-1.3B-540P \
    --local-dir checkpoints/SkyReels-V2-I2V-1.3B-540P

# 2. Run
python world_model/inference/action_transfer.py \
    --checkpoint_path   checkpoints/world_model/pretrain/model.pt \
    --lam_ckpt          checkpoints/lam/lam_vjepa_align.ckpt \
    --lam_variant       align \
    --reference_video   assets/ref_videos/0.mp4 \
    --first_frame_image assets/images/0.png \
    --output_folder     outputs/action_transfer \
    --use_ema --save_side_by_side

🏋️ Training

0️⃣ Dataset

The current release is trained on MiraData (3D Rendering and City Walking categories) for both LAM and world-model pretraining, and on MIND for world-model finetuning and evaluation.

2️⃣ Olaf-World Pretraining

3️⃣ Olaf-World Finetuning

4️⃣ Self-Forcing Distillation

📌 TODO

  • Release world model pretraining and finetuning pipeline.
  • Release evaluation code.
  • Release distillation pipeline.

📖 Citation

If you find this work useful, please cite:

@article{jiang2026olaf,
  title={Olaf-World: Orienting Latent Actions for Video World Modeling},
  author={Jiang, Yuxin and Gu, Yuchao and Tsang, Ivor W and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2602.10104},
  year={2026}
}

⭐ Acknowledgements

Olaf-World builds on several excellent open-source projects. Many thanks to AdaWorld and Self-Forcing, and to the pretrained visual foundation models whose checkpoints and code we rely on: V-JEPA 2, VideoMAEv2, and SkyReels-V2.

Releases

No releases published

Packages

 
 
 

Contributors