-
Notifications
You must be signed in to change notification settings - Fork 93
feat: make vllm-metal backend installation opt-in #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @doringeman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the backend installation process to make the vllm-metal backend opt-in rather than a default installation. This improves the user experience by allowing users to install only the backends they need, reducing initial setup overhead and potential conflicts, especially for Docker Desktop environments. The changes involve introducing new API endpoints and updating the internal scheduling and installation mechanisms to support this deferred installation model. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces an opt-in installation mechanism for the vllm-metal backend, addressing the need for users, especially Docker Desktop users, to avoid default installations. This change involves modifying several Go files to include vllm-metal in backend lists, adding a new InstallBackend function in the desktop client, and updating the scheduler to handle deferred backend installations. Documentation files for install-runner, reinstall-runner, and start-runner commands have also been updated to reflect the inclusion of vllm-metal as a backend option. The changes improve flexibility for users by allowing them to explicitly install vllm-metal when needed, while maintaining the existing backend management for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 4 issues, and left some high level feedback:
- In
handleOpenAIInferencethere's a leftoverWarnln("ALOHA", err)debug log that should be removed or made more descriptive before merge. - In
installer.run, when a deferred backend has on-disk data butInstallreturns an error, the error is silently ignored and the status channels remain open; consider recording the failure onstatus.failedso callers ofwaitget a concrete error instead of treating the backend as never-installed. - The new
/install-backendendpoint currently wraps any non-ErrBackendNotFounderror as a 500 with the raw error string; consider distinguishing transient install failures (e.g., network issues) with a 5xx status but a more user-facing message and optionally exposing the underlying error via logs instead of the HTTP response.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `handleOpenAIInference` there's a leftover `Warnln("ALOHA", err)` debug log that should be removed or made more descriptive before merge.
- In `installer.run`, when a deferred backend has on-disk data but `Install` returns an error, the error is silently ignored and the status channels remain open; consider recording the failure on `status.failed` so callers of `wait` get a concrete error instead of treating the backend as never-installed.
- The new `/install-backend` endpoint currently wraps any non-`ErrBackendNotFound` error as a 500 with the raw error string; consider distinguishing transient install failures (e.g., network issues) with a 5xx status but a more user-facing message and optionally exposing the underlying error via logs instead of the HTTP response.
## Individual Comments
### Comment 1
<location> `pkg/inference/scheduling/scheduler.go:114-117` </location>
<code_context>
}
if config.GetFormat() == types.FormatSafetensors {
- // Prefer vllm-metal for safetensors models on macOS (most feature-rich for Metal)
+ // Prefer vllm-metal for safetensors models on macOS (most feature-rich for Metal),
+ // but only if it has been installed.
if vllmMetalBackend, ok := s.backends[vllmmetal.Name]; ok && vllmMetalBackend != nil {
+ if s.installer.isInstalled(vllmmetal.Name) {
+ return vllmMetalBackend
</code_context>
<issue_to_address>
**issue (bug_risk):** vllm-metal branch returns the backend even when it is not installed, contradicting the comment and log message.
If `isInstalled` returns false, the code still returns `vllmMetalBackend` right after logging the warning, so requests are routed to an uninstalled backend, contradicting the comment and log message and likely failing later.
Instead, only return vllm-metal when `isInstalled` is true and otherwise fall through to the existing MLX/other backend selection, e.g.:
```go
if vllmMetalBackend, ok := s.backends[vllmmetal.Name]; ok && vllmMetalBackend != nil {
if s.installer.isInstalled(vllmmetal.Name) {
return vllmMetalBackend
}
s.log.Infof("vllm-metal backend is available but not installed. To install, run: docker model install-runner --backend %s", vllmmetal.Name)
// fall through to MLX/other fallback
}
```
</issue_to_address>
### Comment 2
<location> `pkg/inference/scheduling/installer.go:105-110` </location>
<code_context>
// ubiquitous backend and mlx as a relatively lightweight backend (on macOS
// only), this granularity is probably less of a concern.
for name, backend := range i.backends {
+ // For deferred backends, check if they are already installed on disk
+ // from a previous session. Only call Install() (which verifies the
+ // existing installation) when files are present, to avoid triggering
+ // a download.
+ if i.deferredBackends[name] {
+ status := i.statuses[name]
+ if diskUsage, err := backend.GetDiskUsage(); err == nil && diskUsage > 0 {
</code_context>
<issue_to_address>
**issue (bug_risk):** Deferred-backend Install errors during run() are silently ignored and never surface via wait().
In the deferred-backend branch, when `GetDiskUsage` > 0 you call `Install`, but if `Install` returns an error you just `continue`, leaving both `installed` and `failed` channels open. As a result:
* `wait()` always returns `errBackendNotInstalled`, hiding the real install error.
* There’s no logging or propagation of the failure.
Please handle this like the normal install path: set `status.err`, close `status.failed`, and (optionally) log the error so `wait()` callers see the actual failure and state is consistent with other install flows.
</issue_to_address>
### Comment 3
<location> `pkg/inference/scheduling/installer.go:237-207` </location>
<code_context>
+
+// isInstalled returns true if the given backend has completed installation.
+// It is non-blocking.
+func (i *installer) isInstalled(name string) bool {
+ status, ok := i.statuses[name]
+ if !ok {
+ return false
+ }
+ select {
+ case <-status.installed:
+ return true
</code_context>
<issue_to_address>
**issue (bug_risk):** isInstalled reads statuses without synchronization while installBackend can mutate the map entry, which is a potential data race.
`installBackend` replaces entries in `i.statuses` under `i.mu`, but `isInstalled` reads both `i.statuses` and `status.installed` without any locking. This creates a data race between these methods when used concurrently.
Consider either:
- Taking `i.mu` in `isInstalled` while accessing `statuses`, or
- Reusing the existing `installStatus` instance instead of replacing it, or
- Protecting `statuses` with a dedicated sync primitive (e.g., `sync.RWMutex`).
Any of these would make the concurrent interaction between these methods safe.
</issue_to_address>
### Comment 4
<location> `pkg/inference/scheduling/http_handler.go:204` </location>
<code_context>
// don't allow any requests to be scheduled for a backend until it has
// completed installation.
if err := h.scheduler.installer.wait(r.Context(), backend.Name()); err != nil {
+ h.scheduler.log.Warnln("ALOHA", err)
+
</code_context>
<issue_to_address>
**suggestion:** Leftover debug log message (“ALOHA”) is noisy and not actionable in production.
The `Warnln("ALOHA", err)` call appears to be temporary debug logging and emits an opaque message on every backend readiness error. Please remove it or replace it with a structured, descriptive log (e.g., including backend name and context) that follows existing logging conventions.
Suggested implementation:
```golang
if err := h.scheduler.installer.wait(r.Context(), backend.Name()); err != nil {
h.scheduler.log.
WithError(err).
WithField("backend", backend.Name()).
Warn("failed waiting for backend installation to complete")
if errors.Is(err, ErrBackendNotFound) {
```
If `h.scheduler.log` does not yet support `WithError` / `WithField` (e.g., if it's not a logrus-style logger), you should:
1. Replace the logging call with whatever structured logging helpers are used elsewhere in this file or package (for example, `h.scheduler.log.Warnf("failed waiting for backend %s installation to complete: %v", backend.Name(), err)`).
2. Keep the message similarly descriptive and include the backend name and the error.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
a895020 to
e24fb26
Compare
…allBackend handler Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
| } | ||
|
|
||
| // Backends whose installation is deferred until explicitly requested. | ||
| var deferredBackends []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this defferedBackends idea 👏
| m["GET "+inference.InferencePrefix+"/v1/models"] = h.handleModels | ||
| m["GET "+inference.InferencePrefix+"/v1/models/{name...}"] = h.handleModels | ||
|
|
||
| m["POST "+inference.InferencePrefix+"/install-backend"] = h.InstallBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use only a POST request to /engines to make the API more RESTful, or keep verbs in the path like we do for actions such as ps and df?
|
I'd rather we auto-pulled it on-demand tbh, I want the huggingface help for example to be only this: If a user wants to Ctrl-C that's fine, we should auto-pull and install like a "docker run" . If we have to change the huggingface help from a single "docker run" to multiple DMR commands, we have failed. |
This is needed as people might not want to have
vllm-metalinstalled by default, especially for Docker Desktop users.Eventually we'll switch to an approach when all the backends will be so self-contained and independently installable.
To test: