RLEnvironment.collect_rollout is sequential. GRPO requires N=8 parallel rollouts from the same VM starting state. Sequential rollouts violate the same-state assumption (each rollout mutates the VM).
Need ParallelRolloutCollector using PoolManager with VM snapshots or synchronized resets. Currently documented as "future work" in rollout_collector.py.