feat(units): add an expression parser#173
Conversation
92bf24c to
0fe0ef7
Compare
|
Thanks a lot! I think it would be better if we can use the new functionality to check if the |
Address PR metatensor#173 review feedback from GardevoirX: - Add s, second, ms, us, ns, ps with full-word aliases to time tokens - Add tests verifying ModelOutput rejects mismatched quantity/unit dims - Add tests for standalone micro sign (U+00B5) -> Dalton resolution - Update docs token table and doxygen with new time unit coverage - Fix stray dash in RST list-table Dimensionless row
docs/src/torch/reference/misc.rst
Outdated
| Dimensional compatibility is verified automatically; no ``quantity`` parameter | ||
| is needed. Token lookup is case-insensitive, and whitespace is ignored. |
There was a problem hiding this comment.
Should this be documented as part of unit_conversion_factor directly?
docs/src/torch/reference/misc.rst
Outdated
| - Tokens | ||
| - Notes | ||
| * - **Length** | ||
| - ``angstrom`` (``a``), ``bohr``, ``nm`` (``nanometer``), ``meter`` (``m``), ``cm`` (``centimeter``), ``mm`` (``millimeter``), ``um`` (``micrometer``) |
There was a problem hiding this comment.
| - ``angstrom`` (``a``), ``bohr``, ``nm`` (``nanometer``), ``meter`` (``m``), ``cm`` (``centimeter``), ``mm`` (``millimeter``), ``um`` (``micrometer``) | |
| - ``angstrom`` (``A``), ``bohr``, ``nm`` (``nanometer``), ``meter`` (``m``), ``cm`` (``centimeter``), ``mm`` (``millimeter``), ``um`` (``micrometer``) |
This should explicitly list µm as well. I'm also not sure why the main unit is sometimes meter sometimes cm. I would use the full name every time, and put the abbreviation in parenthesis.
docs/src/torch/reference/misc.rst
Outdated
| - ``angstrom`` (``a``), ``bohr``, ``nm`` (``nanometer``), ``meter`` (``m``), ``cm`` (``centimeter``), ``mm`` (``millimeter``), ``um`` (``micrometer``) | ||
| - | ||
| * - **Energy** | ||
| - ``ev``, ``mev``, ``hartree``, ``ry`` (``rydberg``), ``joule`` (``j``), ``kcal``, ``kj`` |
There was a problem hiding this comment.
| - ``ev``, ``mev``, ``hartree``, ``ry`` (``rydberg``), ``joule`` (``j``), ``kcal``, ``kj`` | |
| - ``eV``, ``meV``, ``Hartree``, ``Ry`` (``rydberg``), ``Joule`` (``J``), ``kcal``, ``kJ`` |
let's use the usual upper case version of the units in the docs
docs/src/torch/reference/misc.rst
Outdated
| - ``ev``, ``mev``, ``hartree``, ``ry`` (``rydberg``), ``joule`` (``j``), ``kcal``, ``kj`` | ||
| - ``kcal`` and ``kj`` are bare (not per-mol); write ``kcal/mol`` for the per-mole unit | ||
| * - **Time** | ||
| - ``s`` (``second``), ``ms`` (``millisecond``), ``us`` (``microsecond``), ``ns`` (``nanosecond``), ``ps`` (``picosecond``), ``fs`` (``femtosecond``) |
docs/src/torch/reference/misc.rst
Outdated
| Known quantities | ||
| ^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Long term, I think we'll want to remove the user-provided quantity, and just check that an "energy" output has an unit with the dimension of an energy.
metatomic-torch/tests/models.cpp
Outdated
|
|
||
| CHECK_THROWS_WITH(options->set_length_unit("unknown"), | ||
| StartsWith("unknown unit 'unknown' for length") | ||
| StartsWith("unknown unit token 'unknown'") |
There was a problem hiding this comment.
I don't think user care about tokens, so why mention it in the error message?
metatomic-torch/CHANGELOG.md
Outdated
|
|
||
| ### Changed | ||
|
|
||
| - `validate_unit` now accepts arbitrary parseable expressions and validates |
There was a problem hiding this comment.
validate_unit is not public and should not be mentioned here =)
| _unit_conversion_factor_v2 = torch.ops.metatomic.unit_conversion_factor_v2 | ||
| _unit_conversion_factor_v1 = torch.ops.metatomic.unit_conversion_factor | ||
|
|
||
| def unit_conversion_factor(*args, **kwargs): |
There was a problem hiding this comment.
does this work with TorchScript? I thought *args, **kwargs is not supported
Replace the old string-matching unit conversion with a Shunting-Yard expression parser that supports *, /, ^, and parentheses. Unit code extracted to units.hpp/cpp per review feedback. Micro sign handling uses explicit base_units entries instead of global normalization.
Remove standalone micro sign test (no longer normalizes globally),
add micro sign microsecond tests instead. Update error message
assertions ("unknown unit" not "unknown unit token"). Add comment
explaining why models.cpp test uses valid quantity/unit strings.
Move full unit expression documentation to the Doxygen comment on unit_conversion_factor in units.hpp (renders via autofunction). Remove redundant standalone sections from misc.rst, keeping only the known-quantities table and deprecation note.
01198d7 to
eb8e28a
Compare
Remove validate_unit from CHANGELOG (not public API). Move unit_conversion_factor docstring from documentation.py to the Python function in __init__.py (documentation.py is only for C++ ops).
to_lower() now skips non-ASCII bytes, preventing macOS locale from mangling UTF-8 micro sign (0xB5 -> 'u'). Restore unit_conversion_factor in documentation.py since __init__.py imports it for Sphinx builds.
Closes #154.
Replaced the per-quantity lookup tables with a Shunting-Yard expression parser
that works on arbitrary compound unit strings in the spirit of lumol.
Each token resolves to an SI conversion factor and a 5-element dimension vector
[L, T, M, Q, Theta]. The parser composes these through multiplication, division,
and exponentiation. Conversion factor between two expressions = ratio of their
SI factors after verifying dimension equality.
API changes
Expression syntax
Operators:
*(multiply),/(divide),^(power),()(grouping).Whitespace ignored. Case-insensitive. Numeric literals allowed in exponents.
Fractional exponents via parenthesized division:
^(1/2).Token table
Single flat
unordered_mapwith 30+ entries covering length (angstrom, bohr, nm,m, cm, mm, um), energy (eV, meV, hartree, ry, joule, kcal, kJ), time (fs, ps),
mass (u, kg, g, electronmass), charge (e, coulomb), dimensionless (mol), and
derived (hbar).
Notes
kelvinis NOT in the token table because temperature conversions betweenoffset-based scales (Celsius, Fahrenheit) are non-multiplicative.
DIM_TEMPERATUREexists as dimension [0,0,0,0,1] for potential future use butno tokens currently carry it. (maybe once we do an API break, can revisit during
mini-metatomic)Contributor (creator of pull-request) checklist
Reviewer checklist