Problem
_parse_vlm_output_to_action converts fractions to pixels with int() truncation (trainer.py:767-768), then _format_action_as_text converts back with 2-decimal formatting (trainer.py:707). External RL training use case models use 3 decimal places (e.g., 0.461). The roundtrip loses precision:
0.461 * 1920 = 885.12 -> int(885)
885 / 1920 = 0.4609375 -> "0.46"
Proposed Fix
Consider storing coordinates as fractions throughout and only converting to pixels at the adapter boundary.