An xarray based package for alignment of meteorological datasets
mxalign is an xarray-based package designed for the alignment and verification of meteorological datasets. It standardizes operations across datasets by attaching properties along three main axes:
- Space: Grid or point-based data
- Time: Forecasts, observations, or climatology
- Uncertainty: Deterministic, ensemble, or quantile forecasts
Currently, mxalign also acts as a full execution engine. It can load datasets (e.g., Anemoi inference outputs, observation datasets), apply transformations, align datasets in both space and time to match a reference, safely broadcast NaNs, and execute verification metrics on scaled Dask clusters (Local or Slurm).
⚠️ Roadmap & Future Architecture Changes (planned for v0.2.0): Currently,mxalignhandles both alignment and the execution of the verification tooling pipeline, including loading and validation. In the upcomingv0.2.0release, this architecture will be refactored:
- Loading will be split out into
mlwp-data-loaders.- Validation of loaded
xr.Datasets will be moved tomlwp-data-specs(which will contain the requirements for each of the dataset traits and the validation logic).- Execution of the full verification pipeline (loading, transformations, alignment, and verification) from configuration files may be moved to a separate package in future releases.
- Tests will be added to
mxalign(building on test datasets already integrated intomlwp-data-loaders) that ensure that all alignment operations work correctly (Testing notebook execution insidemxalignis explicitly excluded from the current roadmap).
mxalign provides building blocks for manual alignment, transformations, and interpolations of xarray datasets. This is ideal for interactive use in Jupyter notebooks or custom Python scripts.
import xarray as xr
from mxalign import load, align_space, align_time, transform
# Load datasets (using registered loaders)
ds_obs = load(name="observations_loader", files=["obs.nc"])
ds_fcst = load(name="anemoi_inference", files=["forecast.nc"])
# Align the forecast spatially to match the observation reference
ds_fcst_aligned_space = align_space(ds_fcst, reference=ds_obs, method="interpolation")
# Align datasets temporally
datasets = {"obs": ds_obs, "fcst": ds_fcst_aligned_space}
aligned_datasets = align_time(datasets, method="intersection")For a more comprehensive interactive example, check out the introductory notebook.
For full verification pipeline execution, mxalign uses a YAML configuration file. This allows you to declaratively define how datasets are loaded, transformed, aligned, and verified.
The configuration file is divided into several main sections:
datasets:
# Define datasets to load, specifying the loader, files, and variables
obs_data:
loader: observations_loader
files: ["obs.nc"]
fcst_data:
loader: anemoi_inference
files: ["forecast.nc"]
transformations:
# Apply transformations to loaded datasets
alignment:
# Define reference dataset and alignment methods (space, time, NaN broadcasting)
reference: obs_data
time:
method: intersection
verification:
# Specify the reference dataset and the metrics to calculate
reference: obs_data
metrics:
# define metrics hereThe CLI uses Dask to distribute the workload and supports both local execution and execution on Slurm-managed HPC clusters.
Local Execution Run the pipeline on a local Dask cluster:
mxalign local path/to/config.yaml --n_workers 4 --threads_per_worker 1Slurm Execution Run the pipeline on a Slurm cluster:
mxalign slurm path/to/config.yaml --account your_account --queue your_queue --cores 8 --memory 64GBYou can also execute the entire configuration-driven pipeline directly from Python using the Runner class.
from mxalign.runner import Runner
# Initialize the runner with a YAML config file or a dictionary
runner = Runner("path/to/config.yaml")
# Execute the pipeline: loads, transforms, aligns, and verifies the datasets
runner.run()
# The resulting aligned datasets and computed metrics are accessible via:
aligned_datasets = runner.datasets
metrics = runner.metrics