feat(units): add an expression parser by HaoZeke · Pull Request #173 · metatensor/metatomic

HaoZeke · 2026-03-04T11:09:57Z

Closes #154.

Replaced the per-quantity lookup tables with a Shunting-Yard expression parser
that works on arbitrary compound unit strings in the spirit of lumol.

"kJ/mol/A^2"  -->  tokenize  -->  shunting-yard  -->  AST  -->  eval
                   [kJ,/,mol,     [kJ,mol,/,         tree     {factor, dim}
                    /,A,^,2]       A,2,^,/]

Each token resolves to an SI conversion factor and a 5-element dimension vector
[L, T, M, Q, Theta]. The parser composes these through multiplication, division,
and exponentiation. Conversion factor between two expressions = ratio of their
SI factors after verifying dimension equality.

API changes

| Before (3-arg)                                  | After (2-arg)                                    |
|-------------------------------------------------+--------------------------------------------------|
| ~unit_conversion_factor("energy", "eV", "meV")~   | ~unit_conversion_factor("eV", "meV")~              |
| ~unit_conversion_factor("force", "eV/A", "eV/A")~ | ~unit_conversion_factor("eV/A", "eV/A")~           |
| Not possible                                    | ~unit_conversion_factor("(eV*u)^(1/2)", "u*A/fs")~ |

Expression syntax

Operators: * (multiply), / (divide), ^ (power), () (grouping).
Whitespace ignored. Case-insensitive. Numeric literals allowed in exponents.
Fractional exponents via parenthesized division: ^(1/2).

Token table

Single flat unordered_map with 30+ entries covering length (angstrom, bohr, nm,
m, cm, mm, um), energy (eV, meV, hartree, ry, joule, kcal, kJ), time (fs, ps),
mass (u, kg, g, electron_mass), charge (e, coulomb), dimensionless (mol), and
derived (hbar).

Notes

kelvin is NOT in the token table because temperature conversions between
offset-based scales (Celsius, Fahrenheit) are non-multiplicative.
DIM_TEMPERATURE exists as dimension [0,0,0,0,1] for potential future use but
no tokens currently carry it. (maybe once we do an API break, can revisit during mini-metatomic)

Contributor (creator of pull-request) checklist

Tests updated (for new features and bugfixes)?
Documentation updated (for new features)?
Issue referenced (for PRs that solve an issue)?

Reviewer checklist

CHANGELOG updated with public API or any other important changes?

GardevoirX · 2026-03-04T12:46:24Z

Thanks a lot! I think it would be better if we can use the new functionality to check if the quantity and unit match, when initializing ModelOutput here
https://github.com/HaoZeke/metatomic/blob/0fe0ef73e82b089811278472acba56267578b80c/metatomic-torch/include/metatomic/torch/model.hpp#L48-L61

metatomic-torch/src/model.cpp

python/metatomic_torch/tests/units.py

Address PR metatensor#173 review feedback from GardevoirX: - Add s, second, ms, us, ns, ps with full-word aliases to time tokens - Add tests verifying ModelOutput rejects mismatched quantity/unit dims - Add tests for standalone micro sign (U+00B5) -> Dalton resolution - Update docs token table and doxygen with new time unit coverage - Fix stray dash in RST list-table Dimensionless row

GardevoirX

LGTM, love it!

Luthaf · 2026-03-09T10:15:15Z