Skip to content

sony/coda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CODA: Contrastive Object-centric Diffusion Alignment

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) introduces register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot–image correspondence.

License GitHub arXiv Huggingface

🚀 Installation

The training and evaluation code requires PyTorch. Clone the repository then use requirements.txt to install dependencies

pip install -r requirements.txt

Data preparation

All datasets will be downloaded and placed at $USER_DATA. Run the following command to get the data.

# define where to store data 
export USER_DATA=...

# download the dataset
bash preprocess/download.sh voc coco movi-c movi-e

🎮 Training

We use the following script for training.

bash scripts/train.sh <dataset>

where dataset can be one of [voc, coco, movi-c, movi-e].

To enable logging with wandb, place your API key in a .key file.

📝 Evaluation

The diffusion pipeline can be loaded as follows.

from src.model.pipeline import DiffusionPipeline

image = <image_tensor>
model_path = <path_to_pretrained_model>
model = DiffusionPipeline.from_pretrained(model_path).to("cuda")

with torch.no_grad():
    slots = model.encoder(image)
    image_rec = model.sample(slots, resolution=512)

We use the following script for evaluation.

bash scripts/eval.sh <dataset>

where dataset can be one of [voc, coco, movi-c, movi-e].

📥 Pretrained models are available.

Dataset FG-ARI⬆️ mBOi⬆️ mBOc⬆️ mIoUi⬆️ mIoUc⬆️ Download
MOVi-C 59.19 46.55 51.94 Hugging Face Spaces
MOVi-E 59.04 43.45 45.21 Hugging Face Spaces
VOC 32.23 55.38 61.32 50.77 56.30 Hugging Face Spaces
COCO 47.54 36.61 41.43 36.41 42.60 Hugging Face Spaces

📝 Citation

Please cite our paper if you find it useful in your research:

@inproceedings{nguyen2026coda,
  title={Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment}, 
  author={Bac Nguyen and Yuhta Takida and Naoki Murata and Chieh-Hsin Lai and Toshimitsu Uesaka and Stefano Ermon and Yuki Mitsufuji},
  year={2026},
  booktitle={The Fourteenth International Conference on Learning Representations},
}

Acknowledgement

We thank the authors of SlotDiffusion, Latent Slot Diffusion and Latent Diffusion Models for making their implementations publicly available.

License

CODA is released under the Apache License 2.0. See the LICENSE file for more details.

About

Official PyTorch implementation of "Contrastive Object-centric Diffusion Alignment"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors