RainShift: A Benchmark for Precipitation Downscaling Across Geographies

This repository contains the code for the paper "Benchmarking the geographic generalization of deep learning models for precipitation downscaling", published in Scientific Reports and available here.

The codebase covers training and evaluation of deep learning models (ResNet, GAN and diffusion) for precipitation downscaling from ERA5 to IMERG.

See the visualization below for an overview of the experimental setup.

The goal is to investigate how these models generalize to unseen geographic regions and to support the development of downscaling methods that remain robust to geographic distribution shifts.

Installation

This project requires Python 3.10–3.12 and Poetry for dependency management.

Install Poetry as a standalone tool by following the install instructions.

Then install the project dependencies from the repository root:

poetry install

Setup

Optionally complete the following one-time setup steps before running any experiments:

Configure distributed training for your environment:
```
accelerate config
```
Log in to Weights & Biases (only required if using W&B tracking):
```
wandb login
```
Download the dataset (see Data below).

Data

The dataset is publicly available on Hugging Face at RainShift/rainshift and provides ERA5 and IMERG data.

Once downloaded, update the data_path variable in your sbatch file to point to the local dataset location:

data_path='/path/to/rainshift/data/'

Quick Start (Mini Test Run)

To verify the setup works without downloading the full dataset, a minimal test configuration is provided using a small data subset called northamerica-west-mini, which contains 1,000 training samples.

Script: scripts/templates/train_cnn_mini.sh
Config: config/templates/train_cnn_mini.yaml
Data: tests/example_data/northamerica-west-mini (included in the repository, but requires unzipping.)

First, unzip the example data (only needed once):

cd tests/example_data/northamerica-west-mini && unzip train_data_in.zip && unzip train_data_out.zip

Before submitting the job, make the following updates:

In scripts/templates/train_cnn_mini.sh:

Set PROJECT_DIR to the path of your local repository clone

In config/templates/train_cnn_mini.yaml:

Set model_path to the directory where the model weights should be saved during training.
Set either wandb_dir (if wandb is in mode) or result_dir (if not using W&B) — plots are saved to one or the other depending on the mode.

The dataset area (areas field in the config) must be set to the target area, here northamerica-west-mini, which matches the included example data.

Then submit the job with:

sbatch scripts/templates/train_cnn_mini.sh

Running Experiments

Template sbatch scripts for running experiments are provided in scripts/templates/. Corresponding config files are available in config/templates/ and will need to be adapted before use.

Configuration

The following configs are available:

Config	Purpose
`train_cnn.yaml`	Train CNN model
`train_cnn_resume.yaml`	Resume CNN training
`train_gan.yaml`	Train GAN model
`train_dm.yaml`	Train Diffusion model
`eval_cnn.yaml`	Evaluate CNN model
`eval_gan.yaml`	Evaluate GAN model
`eval_dm.yaml`	Evaluate Diffusion model

At minimum, each config requires the following fields to be set:

mode: 'train_wandb_benchmark'  # include 'wandb' for W&B tracking
experiments_dir: ''             # path to save experiment outputs
model_path: ''                  # path to save model weights (training) or load them (eval)
wandb_dir: ''                   # path for W&B logs (use if 'wandb' is in mode)
result_dir: ''                  # path to save plots locally (use if 'wandb' is not in mode)

SBATCH Scripts

Note: The provided scripts are designed for SLURM-managed HPC clusters and rely on SLURM environment variables. They will not work as-is on a local machine.

CNN & GAN

train_cnn_tmp.sh — Train the CNN or GAN model
test_cnn_tmp.sh — Evaluate the CNN or GAN model

Diffusion Model

train_dm_tmp.sh — Train the diffusion model
test_dm_tmp_01.sh — Evaluate on a target area (first index range)
test_dm_tmp_02.sh — Evaluate on a target area (second index range)
test_dm_tmp_all.sh — Evaluate across all target areas in a single job

Note: test_dm_tmp_01.sh and test_dm_tmp_02.sh split evaluation into two index ranges and must be run separately per target area. test_dm_tmp_all.sh loops over all target areas and can be submitted as a single job.

Before running any script, update the following variables:

Variable	Description
`PROJECT_DIR`	Path to your local project directory clone
`config_path`	Path to the directory containing your config file
`config_name`	Name of your config file
`run_name`	Name of your run
`data_path`	Path to your data
`target_area`	Area to train/evaluate on
`start_ind` / `end_ind`	Index range (only for `test_dm_tmp_01.sh` and `test_dm_tmp_02.sh`)

Weights & Biases

To enable experiment tracking with Weights & Biases, first log in:

wandb login

Then ensure your config has wandb included in the mode field and wandb_dir set:

mode: 'train_wandb_benchmark'
wandb_dir: '/path/to/wandb/logs'

Accelerate

This project uses 🤗 Accelerate for distributed training. Before running any training script, configure it for your environment:

accelerate config

Training Details

Training runtimes and hardware requirements vary by model and training area size:

Model	Hardware	Training Time
GAN / Diffusion Model	4× NVIDIA A100 (effective batch size 128)	60–168 hours
ResNet (CNN)	1× NVIDIA RTX8000	45–280 hours

Name		Name	Last commit message	Last commit date
Latest commit History 564 Commits
config		config
data		data
downscaling		downscaling
images		images
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGES.md		CHANGES.md
README.md		README.md
image.png		image.png
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RainShift: A Benchmark for Precipitation Downscaling Across Geographies

Installation

Setup

Data

Quick Start (Mini Test Run)

Running Experiments

Configuration

SBATCH Scripts

CNN & GAN

Diffusion Model

Before running any script, update the following variables:

Weights & Biases

Accelerate

Training Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RainShift: A Benchmark for Precipitation Downscaling Across Geographies

Installation

Setup

Data

Quick Start (Mini Test Run)

Running Experiments

Configuration

SBATCH Scripts

CNN & GAN

Diffusion Model

Before running any script, update the following variables:

Weights & Biases

Accelerate

Training Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages