Tool to run CABLE configurations without payu

# Why

We would like a light-weight runner that runs CABLE configurations designed for [payu](github.com/payu-org/payu). We want a tool other than `payu` for 2 reasons:

1. `payu` only operates on Gadi, and that's unlikely to change anytime soon. We want to be able to run on any machine with a standard python installation.
2. `payu` is fairly heavy. We want to make something very light that people can fire off without thinking.

# What

What should the tool actually be able to do?

1. Run a configuration in the same way as `payu` would (at the surface level- in that it reads the `config.yaml`, creates the same input and output directories) e.g. with `run-config` from the configuration directory.
2. Add metadata about the configuration being run to the output files of the run. The metadata should consist of:
  * The configuration repository
  * The configuration commit hash (with a `-dirty` for a modified config?)
  * The date of the run
  * Brief record of the inputs used

What it doesn't need to do:
1. Run multi-stage configurations- each configuration consists of a single call to `cable`.
2. Proper provenance tracking of inputs, past recording of file names. While this is a nice to have, I don't think is feasible while still being lightweight and portable. Minimal proper provenance would require checksum calculations, which can be a significant task for inputs spread over many files as atmosphere forcing often is. Some machines may have official catalogues with DOIs and such, but relying this is certainly not portable.

## On recording the commit hash

Anyone wanting to run in a different location will inevitably have to modify the configuration, to point to the correct inputs on that machine. This will lead to a "dirty" repository. Is this desirable? Do we need to provide information about the diff? If we don't, then it's not possible to determine whether a `dirty` represents a change in the input paths, or a change in the science configuration. I would be in favour of preventing the `config.yaml` modifying the namelist altogether, and only configure the input locations and job submission. Then, if the repository is "dirty", we can provide a diff excluding the `config.yaml`? Or perhaps exclude the `config.yaml` from the "dirty" check?

My choice would be to ignore `config.yaml` changes in the "dirty" check, *assuming we exclude namelist changes from the `config.yaml`*. This way the "dirty" strictly relates to science choices.

## On tracking inputs

As stated above, doing checksums on inputs I think is outside the remit of this tool, as it can easily get expensive. What we can do very cheaply is provide "date last modified" information about each of the inputs. This way, if configuration results change unexpectedly, we can quickly look at the "date last modified" for the inputs and determine whether this was a source of change.

# Summary

The tool will allow someone to call `config-run` from a `payu` configuration directory, to run the configuration like `payu run` would. The tool will run on any machine that has a standard installation of python. The model outputs would be the same as it would be with `payu`, with the additional metadata applied to each of the outputs:

* `"configuration_repo": <repo_URL>"
* "configuration_hash": <commit_hash with optional "-dirty">
* "date_created": <ISO 8601 format date>
* "inputs": "<path_to_input> <date_last_modified ISO8601 format>\n
                     <path_to_input> <date_last_modified ISO8601 format>\n"
                    
@gabsun @SeanBryan51 How does this sound? Does this meet the specifications we outlined in our meeting? My only remaining question is how to distribute this- should it be a standalone package installable via `pip` or `conda`? Should it ship with CABLE directly?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool to run CABLE configurations without payu #698

Why

What

On recording the commit hash

On tracking inputs

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool to run CABLE configurations without payu #698

Description

Why

What

On recording the commit hash

On tracking inputs

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions