Skip to content

Allow unrecognized units#188

Merged
rprospero merged 11 commits intorefactor_24from
172_unrecognized_units
Mar 3, 2026
Merged

Allow unrecognized units#188
rprospero merged 11 commits intorefactor_24from
172_unrecognized_units

Conversation

@rprospero
Copy link
Contributor

As mentioned in #172, users might want to track units that aren't necessarily SI measurements (e.g. Slices per Pizza, Pages per Book). This PR adds an ArbitraryUnits class which can track these sorts of units while still interactive with the regular Units and NamedUnits` classes.

It also contains a modernisation of the utest_units.py test file so that the tests can be properly searched and controlled through pytest.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@rprospero rprospero force-pushed the 172_unrecognized_units branch from 662433f to 6e85365 Compare February 20, 2026 12:43
Copy link

@codescene-delta-analysis codescene-delta-analysis bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No quality gates enabled for this code.

See analysis details in CodeScene

Quality Gate Profile: Custom Configuration
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@rprospero rprospero marked this pull request as ready for review February 20, 2026 13:14
#
#

class ArbitraryUnit(NamedUnit):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary unit (a.u.) is used for SAS data not on absolute scale. I think UnknownUnit or similar would better match the current naming scheme and not cause confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the confusion. I've switched to UnknownUnit

Comment on lines +329 to +330
numerator: str | list[str] | dict[str, int],
denominator: None | list[str] | dict[str, int]= None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should these strings and/or lists look like? Documentation on this would be helpful. Currently, the unit converter allows a*a, a**2, and a^2. What happens if a^2/b is passed as the numerator? Does it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely it won't matter, since UnknownUnit should only be called after a failed parse that has already split on those characters. However, just in case, I've added validation that throws a runtime error if invalid characters are in any part of the unit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is contained in the _valid_name static method

Comment on lines +332 to +344
case str():
self._numerator = {numerator: 1}
case list():
self._numerator = {}
for key in numerator:
if key in self._numerator:
self._numerator[key] += 1
else:
self._numerator[key] = 1
case dict():
self._numerator = numerator
case _:
raise TypeError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match/case block is almost exactly the same as the one for denominator. Could this be made into a separate function that returns the resulting dictionary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring out the repetition also made the string validation simpler. The separate function is now the _parse_arg class method.

case (_, []):
return " ".join(num)
case ([], _):
return "1 / " + " ".join(den)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give something like 1 / A B C which will create ambiguity for B and C. Maybe return "1 / (" + " ".join(den) + ")"? I would suggest something similar in the default case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added parentheses, but only in the case where there are multiple terms in the denominator. Thus, it will appear as "1 / (A B C)", but the single term will still appear as "1 / A"

return str(self)

@staticmethod
def parse(unit_string: str) -> "Unit":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standardize_units and _format_unit_structure in the existing sasdata.data_util.nxsunit module already has a basic version of this you might want to look at.

@rprospero rprospero merged commit 6e85365 into refactor_24 Mar 3, 2026
11 checks passed
@rprospero rprospero deleted the 172_unrecognized_units branch March 3, 2026 13:26
@rprospero
Copy link
Contributor Author

This has NOT been merged, despite what GitHub claims. Rather, the discussion continues in #190.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants