Imitation-Guided World Models for Multi-Agent Train Rescheduling

Max Bourgeat^1,2 Antoine Legrain^1,2,3 Quentin Cappart^1,2,4

¹Polytechnique Montréal, Montreal, Canada
²CIRRELT, Montreal, Canada
³GERAD, Montreal, Canada
⁴UCLouvain, Louvain-la-Neuve, Belgium

🏠 About

Managing railway disruptions is a complex multi-agent routing problem where a single train failure can propagate delays across the network. Traditional approaches rely on heuristic optimization solvers, which are effective but assume access to a global system view and require substantial expert design, limiting their applicability and generalization. Reinforcement learning (RL) offers an alternative by learning adaptive strategies from interactions with the environment. In this paper, we show that none of the available paradigms is sufficient in isolation: (i) heuristic solvers encode valuable global expertise but cannot be deployed directly, (ii) world models improve sample efficiency but struggle to leverage expert knowledge, and (iii) pure RL can adapt policies but often lacks stability without strong guidance. We propose a hybrid framework that integrates the strengths of these approaches. First, imitation learning transfers knowledge from a global expert solver to initialize a neural policy. Then, model-based RL fine-tunes this policy using the DreamerV2 world model to enhance generalization and responsiveness to local perturbations. Our method builds on the Multi-Agent Model-Based Architecture (MAMBA) to model agent interactions and addresses the challenge of transferring expertise from global solvers to decentralized agents operating on local latent observations. Experiments on a train rescheduling problem using the Flatland environment show that our method outperforms MAMBA, improving performance by up to 23% on difficult instances. This highlights the benefit of combining imitation learning with world-model-based multi-agent RL for complex transportation networks.

🔍 Overview

Framework

Instructions

This code accompanies the paper "Imitation-Guided World Models for Multi-Agent Train Rescheduling".

Installation

python3.9 is required

pip install wheel
pip install flatland-2.2.2/
pip install -r requirements.txt

Installation the solver

To install the solver, follow the procedure you can found here:
https://github.com/Jiaoyang-Li/Flatland
Once you installed it, you need to move the libPythonCBS.so file into agent/solver/.

Usage

All the configuration files are already setup to replicate the main results of the paper. If you want to replicate the ablation studies you should modify the files configs/dreamer/DreamerControllerConfig.py and configs/dreamer/DreamerLearnerConfig.py.
For example, if you want to modify the expert for the random or greedy expert you should replace self.EXPERT_TO_FOLLOW = "solver" by self.EXPERT_TO_FOLLOW = "random" or self.EXPERT_TO_FOLLOW = "greedy" in configs/dreamer/DreamerControllerConfig.py.
If you want to remove the world model (DreamerV2) you should set self.ABLATION_WM to True in configs/dreamer/DreamerControllerConfig.py and configs/dreamer/DreamerLearnerConfig.py.
And if you want to train in a full imitation learning fashion without reinforcement learning you should set self.STEPS_SEQUENTIAL_RL_PHASE to 0 in configs/dreamer/DreamerLearnerConfig.py.

To train the model from scratch :

python3 train.py --n_workers 2 --env flatland --env_name 5_agents --time_limit time_limit_in_second --steps steps_in_the_env

In this repo but not in the paper

You'll find some options not mentioned in the paper, they corresponds to methods we tried but did not performed well. It includes the use of DAgger algorithm which failed beacause of the sensibility of the expert solver to random actions. Indeed, the expert solver does not perform well if periodically we follow a policy which differs from it. We also tried to use the expert action coupled with the observation as an input of the world model. We tried a variant of the DAgger algorithm where we do not use a sequential approach (Imitation Learning then Reinforcement Learning) but a hybrid approach where we optimize both the imitation and the reinforcement losses at the same time with a decreasing weight importance for the imitation loss but then again it perfomed badly.

Code Structure

agent contains implementation of MAMBA
- controllers contains logic for inference
- learners contains logic for learning the agent
- memory contains buffer implementation
- models contains architecture of MAMBA
- optim contains logic for optimizing loss functions
- runners contains logic for running multiple workers
- solver contains logic for using the solver
- utils contains helper functions
- workers contains logic for interacting with environment
env contains environment logic
networks contains neural network architectures

👏 Acknowledgements

MAMBA: Our codebase is built upon MAMBA.
MAPF Solver: Our codebase used the winning solver solution of the Flatland challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agent		agent
configs		configs
env		env
flatland-2.2.2		flatland-2.2.2
networks		networks
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
environments.py		environments.py
flatland1.png		flatland1.png
methodology5_page-0001.jpg		methodology5_page-0001.jpg
requirements.txt		requirements.txt
run_example.py		run_example.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imitation-Guided World Models for Multi-Agent Train Rescheduling

🏠 About

🔍 Overview

Framework

Instructions

Installation

Installation the solver

Usage

In this repo but not in the paper

Code Structure

👏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Imitation-Guided World Models for Multi-Agent Train Rescheduling

🏠 About

🔍 Overview

Framework

Instructions

Installation

Installation the solver

Usage

In this repo but not in the paper

Code Structure

👏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages