Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) introduces register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot–image correspondence.
The training and evaluation code requires PyTorch. Clone the repository then use requirements.txt to install dependencies
pip install -r requirements.txt
All datasets will be downloaded and placed at $USER_DATA. Run the following command to get the data.
# define where to store data
export USER_DATA=...
# download the dataset
bash preprocess/download.sh voc coco movi-c movi-eWe use the following script for training.
bash scripts/train.sh <dataset>where dataset can be one of [voc, coco, movi-c, movi-e].
To enable logging with wandb, place your API key in a .key file.
The diffusion pipeline can be loaded as follows.
from src.model.pipeline import DiffusionPipeline
image = <image_tensor>
model_path = <path_to_pretrained_model>
model = DiffusionPipeline.from_pretrained(model_path).to("cuda")
with torch.no_grad():
slots = model.encoder(image)
image_rec = model.sample(slots, resolution=512)We use the following script for evaluation.
bash scripts/eval.sh <dataset>where dataset can be one of [voc, coco, movi-c, movi-e].
📥 Pretrained models are available.
| Dataset | FG-ARI⬆️ | mBOi⬆️ | mBOc⬆️ | mIoUi⬆️ | mIoUc⬆️ | Download |
|---|---|---|---|---|---|---|
| MOVi-C | 59.19 | 46.55 | — | 51.94 | — |
|
| MOVi-E | 59.04 | 43.45 | — | 45.21 | — |
|
| VOC | 32.23 | 55.38 | 61.32 | 50.77 | 56.30 |
|
| COCO | 47.54 | 36.61 | 41.43 | 36.41 | 42.60 |
|
Please cite our paper if you find it useful in your research:
@inproceedings{nguyen2026coda,
title={Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment},
author={Bac Nguyen and Yuhta Takida and Naoki Murata and Chieh-Hsin Lai and Toshimitsu Uesaka and Stefano Ermon and Yuki Mitsufuji},
year={2026},
booktitle={The Fourteenth International Conference on Learning Representations},
}
We thank the authors of SlotDiffusion, Latent Slot Diffusion and Latent Diffusion Models for making their implementations publicly available.
CODA is released under the Apache License 2.0. See the LICENSE file for more details.
