CODA: Contrastive Object-centric Diffusion Alignment

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) introduces register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot–image correspondence.

🚀 Installation

The training and evaluation code requires PyTorch. Clone the repository then use requirements.txt to install dependencies

pip install -r requirements.txt

Data preparation

All datasets will be downloaded and placed at $USER_DATA. Run the following command to get the data.

# define where to store data 
export USER_DATA=...

# download the dataset
bash preprocess/download.sh voc coco movi-c movi-e

🎮 Training

We use the following script for training.

bash scripts/train.sh <dataset>

where dataset can be one of [voc, coco, movi-c, movi-e].

To enable logging with wandb, place your API key in a .key file.

📝 Evaluation

The diffusion pipeline can be loaded as follows.

from src.model.pipeline import DiffusionPipeline

image = <image_tensor>
model_path = <path_to_pretrained_model>
model = DiffusionPipeline.from_pretrained(model_path).to("cuda")

with torch.no_grad():
    slots = model.encoder(image)
    image_rec = model.sample(slots, resolution=512)

We use the following script for evaluation.

bash scripts/eval.sh <dataset>

where dataset can be one of [voc, coco, movi-c, movi-e].

📥 Pretrained models are available.

Dataset	FG-ARI⬆️	mBOⁱ⬆️	mBO^c⬆️	mIoUⁱ⬆️	mIoU^c⬆️
MOVi-C	59.19	46.55	—	51.94	—
MOVi-E	59.04	43.45	—	45.21	—
VOC	32.23	55.38	61.32	50.77	56.30
COCO	47.54	36.61	41.43	36.41	42.60

📝 Citation

Please cite our paper if you find it useful in your research:

@inproceedings{nguyen2026coda,
  title={Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment}, 
  author={Bac Nguyen and Yuhta Takida and Naoki Murata and Chieh-Hsin Lai and Toshimitsu Uesaka and Stefano Ermon and Yuki Mitsufuji},
  year={2026},
  booktitle={The Fourteenth International Conference on Learning Representations},
}

Acknowledgement

We thank the authors of SlotDiffusion, Latent Slot Diffusion and Latent Diffusion Models for making their implementations publicly available.

License

CODA is released under the Apache License 2.0. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
3rd_party		3rd_party
assets		assets
experiment		experiment
preprocess		preprocess
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODA: Contrastive Object-centric Diffusion Alignment

🚀 Installation

Data preparation

🎮 Training

📝 Evaluation

📝 Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CODA: Contrastive Object-centric Diffusion Alignment

🚀 Installation

Data preparation

🎮 Training

📝 Evaluation

📝 Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages