feat(rl-env): support parallel rollout collection via VM pool

RLEnvironment.collect_rollout is sequential. GRPO requires N=8 parallel rollouts from the same VM starting state. Sequential rollouts violate the same-state assumption (each rollout mutates the VM).

Need `ParallelRolloutCollector` using `PoolManager` with VM snapshots or synchronized resets. Currently documented as "future work" in `rollout_collector.py`.