Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, and Mike Zheng Shou
- [04/2026] LAM training and inference code, plus pretrained weights, are released.
- [02/2026] Repo and project page initialized.
conda create -n olaf-world python=3.10 -y
conda activate olaf-world
# Install PyTorch matching your CUDA version, then:
pip install torch torchvision torchaudio
pip install -r requirements.txtZero-shot action transfer: a pretrained LAM and world model transfer motion (as latent action sequences) from a reference video onto a target first frame — no finetuning required. See docs/quickstart.md for the full guide.
# 1. Download checkpoints
hf download YuxinJ/Olaf-World --local-dir checkpoints
hf download Skywork/SkyReels-V2-I2V-1.3B-540P \
--local-dir checkpoints/SkyReels-V2-I2V-1.3B-540P
# 2. Run
python world_model/inference/action_transfer.py \
--checkpoint_path checkpoints/world_model/pretrain/model.pt \
--lam_ckpt checkpoints/lam/lam_vjepa_align.ckpt \
--lam_variant align \
--reference_video assets/ref_videos/0.mp4 \
--first_frame_image assets/images/0.png \
--output_folder outputs/action_transfer \
--use_ema --save_side_by_sideThe current release is trained on MiraData (3D Rendering and City Walking categories) for both LAM and world-model pretraining, and on MIND for world-model finetuning and evaluation.
- Release world model pretraining and finetuning pipeline.
- Release evaluation code.
- Release distillation pipeline.
If you find this work useful, please cite:
@article{jiang2026olaf,
title={Olaf-World: Orienting Latent Actions for Video World Modeling},
author={Jiang, Yuxin and Gu, Yuchao and Tsang, Ivor W and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2602.10104},
year={2026}
}Olaf-World builds on several excellent open-source projects. Many thanks to AdaWorld and Self-Forcing, and to the pretrained visual foundation models whose checkpoints and code we rely on: V-JEPA 2, VideoMAEv2, and SkyReels-V2.
