[R3] Enable R3 with new inference by hao-aaron · Pull Request #1428 · NovaSky-AI/SkyRL

hao-aaron · 2026-04-02T20:58:26Z

…nference codepath Adds a custom `/skyrl/v1/generate` endpoint to `VLLMServerActor` that calls the vLLM engine directly and returns `routed_experts` alongside token output. The standard `/inference/v1/generate` endpoint's `GenerateResponseChoice` does not include `routed_experts` (only available on the Python `CompletionOutput` object), so a custom endpoint is required. Changes: - `vllm_server_actor.py`: Add `/skyrl/v1/generate` endpoint with correct logprobs serialisation (placeholder `-9999.0` for missing entries, matching vLLM's `ChatCompletionLogProb` default) and `routed_experts` extraction. Raises `NotImplementedError` if LoRA is enabled. - `remote_inference_client.py`: Switch `_generate_single` to `/skyrl/v1/generate`; extract and propagate `routed_experts` through to `InferenceEngineOutput.rollout_expert_indices`. - `inference_servers/utils.py`: Pass `enable_return_routed_experts` to vLLM CLI args so the engine computes routed experts. - `train/utils/utils.py`: Gate the `mp` backend assertion for R3 behind `if not _SKYRL_USE_NEW_INFERENCE` (new path uses ray backend); remove the `ValueError` blocking R3 on the new inference path; add startup validation that LoRA + R3 cannot be combined on the new path. - `main_base.py`, `tests/gpu/utils.py`: Pass `enable_return_routed_experts` when constructing `RemoteInferenceClient`. - `test_remote_inference_client.py`: Update mock endpoint to `/skyrl/v1/generate` returning a single choice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pported R3 requires the mp backend to avoid hangs, but mp is not yet supported on the new inference path (tracked in NovaSky-AI#1309). Restore the ValueError blocking R3 on new inference, and un-gate the mp assertion so it applies to both old and new inference paths consistently. The infrastructure changes (/skyrl/v1/generate endpoint, RemoteInferenceClient propagation) remain as pre-work for when mp support lands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-04-02T23:06:36Z

tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

        }

-    @app.post("/inference/v1/generate")
+    @app.post("/skyrl/v1/generate")


🔴 Mock test server removed /inference/v1/generate endpoint, breaking sample() tests

The mock server endpoint was renamed from /inference/v1/generate to /skyrl/v1/generate, but the sample() method in RemoteInferenceClient still uses /inference/v1/generate (skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py:442). The test_sample and test_sample_n2 tests call client.sample(), which will hit the now-nonexistent /inference/v1/generate on the mock server, causing a 404 error. The mock server needs both endpoints: /skyrl/v1/generate for the _generate_single() path and /inference/v1/generate for the sample() path.

Suggested change

@app.post("/skyrl/v1/generate")

@app.post("/skyrl/v1/generate")

@app.post("/inference/v1/generate")

Was this helpful? React with 👍 or 👎 to provide feedback.

@hao-aaron this is a great catch. Can you ensure that the sample method is also using our generate API endpoint ?

https://github.com/hao-aaron/SkyRL/blob/791fbf9e2da75e554667a294e710deb24070cc71/skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py#L432-L433

devin-ai-integration · 2026-04-02T23:06:40Z

skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py

+        async def _skyrl_generate(request: Request):
+            """SkyRL generate endpoint that returns routed_experts alongside token output."""
+            if getattr(cli_args, "enable_lora", False):
+                raise NotImplementedError("/skyrl/v1/generate does not support LoRA.")


🔴 NotImplementedError instead of HTTPException causes 30 silent retries and lost error message

The _skyrl_generate endpoint raises a bare NotImplementedError for the LoRA guard clause. FastAPI converts unhandled exceptions to a plain-text HTTP 500 ("Internal Server Error"). The client's _post method (remote_inference_client.py:227-253) then fails to parse the plain-text body as JSON; since 500 is outside the 4xx range the error is treated as transient and retried 30 times (~30 seconds wasted), ultimately surfacing a confusing JSON-decode error instead of the real "LoRA not supported" message. The correct pattern is already used two lines below at line 369 (raise HTTPException(status_code=500, ...)), so this is an inconsistency within the same function.

Suggested change

raise NotImplementedError("/skyrl/v1/generate does not support LoRA.")

raise HTTPException(status_code=400, detail="/skyrl/v1/generate does not support LoRA.")

Was this helpful? React with 👍 or 👎 to provide feedback.

SumanthRH · 2026-04-03T23:21:39Z

For reference: We ran the Moonlight-16B script with the old and the new inference and got matching curves:

https://api.wandb.ai/links/sky-posttraining-uc-berkeley/lwaaqy73

SumanthRH

Let's address the issue with the sample API

SumanthRH · 2026-04-03T23:15:39Z

tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

        }

-    @app.post("/inference/v1/generate")
+    @app.post("/skyrl/v1/generate")


@hao-aaron this is a great catch. Can you ensure that the sample method is also using our generate API endpoint ?

https://github.com/hao-aaron/SkyRL/blob/791fbf9e2da75e554667a294e710deb24070cc71/skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py#L432-L433

SumanthRH · 2026-04-03T23:22:21Z

examples/train/router_replay/run_moonlight16b_router_replay.sh

+export _SKYRL_USE_NEW_INFERENCE=1
+


Revert?

Suggested change

export _SKYRL_USE_NEW_INFERENCE=1

SumanthRH and others added 9 commits March 19, 2026 17:45

Merge remote-tracking branch 'origin/main' into r3-new-inference

7bae48f

[inference] Add comment explaining /skyrl/v1/generate custom endpoint

4e5d2ef

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

5e942a1

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

testing

e2aba33

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

1a1e649

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

62ef81c

x

fb7841c

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review April 2, 2026 22:46

This comment was marked as resolved.

Sign in to view

x

d417624

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

devin-ai-integration bot reviewed Apr 2, 2026

View reviewed changes

SumanthRH requested changes Apr 3, 2026

View reviewed changes

SumanthRH reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R3] Enable R3 with new inference#1428

[R3] Enable R3 with new inference#1428
hao-aaron wants to merge 10 commits intoNovaSky-AI:mainfrom
hao-aaron:r3-new-inference

hao-aaron commented Apr 2, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Uh oh!

SumanthRH Apr 3, 2026

Uh oh!

devin-ai-integration bot Apr 2, 2026

Uh oh!

SumanthRH commented Apr 3, 2026

Uh oh!

SumanthRH left a comment

Uh oh!

SumanthRH Apr 3, 2026

Uh oh!

SumanthRH Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	@app.post("/skyrl/v1/generate")
	@app.post("/skyrl/v1/generate")
	@app.post("/inference/v1/generate")

	raise NotImplementedError("/skyrl/v1/generate does not support LoRA.")
	raise HTTPException(status_code=400, detail="/skyrl/v1/generate does not support LoRA.")

Conversation

hao-aaron commented Apr 2, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Apr 3, 2026

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hao-aaron commented Apr 2, 2026 •

edited by devin-ai-integration bot

Loading