(lib) Add support for vLLM-backed text embedder by charlesbluca · Pull Request #1494 · NVIDIA/NeMo-Retriever

charlesbluca · 2026-03-05T20:04:11Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

…embeddings-1

…embeddings-1 Made-with: Cursor # Conflicts: # nemo_retriever/src/nemo_retriever/examples/batch_pipeline.py

- Add BeirConfig.use_vllm (default False), forward as embed_use_vllm - batch_pipeline: pass embed_use_vllm into BeirConfig for BEIR eval - Extend test_evaluate_lancedb_beir to assert Retriever kwargs Made-with: Cursor

…embeddings-1

The upstream refactor replaced ingest_modes/ with text_embed/operators.py. Add vLLM branching to _BatchEmbedGPUActor.__init__() so that embed_use_vllm=True in EmbedParams routes through create_local_embedder() with use_vllm=True, matching the interface used for recall queries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-07T18:28:48Z

Greptile Summary

This PR adds vLLM-backed local text embedding as an alternative to the existing HuggingFace path, introducing LlamaNemotronEmbed1BV2Embedder (vLLM), LlamaNemotronEmbed1BV2HFEmbedder (lazy-loaded HF), and LlamaNemotronEmbedVL1BV2VLLMEmbedder (vLLM VL), along with factory routing in create_local_embedder and create_local_query_embedder.

P1 — processor.py HF backend crash: maybe_inject_local_hf_embedder builds a _embed closure that passes prefix=\"\" to embedder_instance.embed(), but LlamaNemotronEmbed1BV2HFEmbedder.embed() has no prefix parameter — any user setting local_ingest_backend=\"hf\" hits an immediate TypeError.
P1 — gpu_operator.py device silently dropped (noted in previous outside-diff comment, still present): create_local_embedder call omits the device kwarg, so user-configured device is ignored for the HF backend.

Confidence Score: 3/5

Not safe to merge as-is: the HF ingest backend crashes with a TypeError, and multiple lifecycle and device-placement issues remain unresolved.

Two P1 defects are present: processor.py crashes when local_ingest_backend="hf" due to an unexpected prefix="" kwarg, and gpu_operator.py silently drops the user-configured device for the HF path. Multiple P2 issues (eager loading in the HF VL embedder, missing unload methods) remain from earlier review rounds. Score is pulled below the P1 ceiling of 4 because the processor crash affects a documented configuration path.

text_embed/processor.py (HF backend crash), text_embed/gpu_operator.py (missing device forwarding), model/local/llama_nemotron_embed_vl_1b_v2_embedder.py (eager HF model load).

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/text_embed/vllm.py	New module for vLLM batched embedding inference — clean, well-structured, with proper batching, empty-slot handling, and multimodal support.
nemo_retriever/src/nemo_retriever/text_embed/processor.py	Injects a local embedder into task_config; passes `prefix=""` kwarg to `LlamaNemotronEmbed1BV2HFEmbedder.embed()` which doesn't accept it — crashes the HF ingest backend.
nemo_retriever/src/nemo_retriever/model/init.py	Factory functions updated to route VL and non-VL models to vLLM or HF backends; `create_local_query_embedder` correctly defaults to `"hf"` for queries.
nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_embedder.py	vLLM-backed embedder with `_finalize_vectors` padding; still loads eagerly in `__post_init__` (known issue from prior review rounds).
nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_hf_embedder.py	New HF-only embedder with correct lazy loading via `_ensure_loaded()`; prefix logic handled internally without a `prefix` kwarg.
nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_vl_1b_v2_embedder.py	Adds `LlamaNemotronEmbedVL1BV2VLLMEmbedder`; existing HF VL embedder still loads model eagerly in `__post_init__`, violating `model-lazy-loading`.
nemo_retriever/src/nemo_retriever/text_embed/gpu_operator.py	Refactored to use `create_local_embedder`; `device` kwarg is not forwarded, silently ignoring user-configured device for the HF backend.
nemo_retriever/tests/test_vllm_embed.py	Thorough unit tests for `embed_with_vllm_llm`, `embed_multimodal_with_vllm_llm`, and the VL vLLM embedder; covers batching, None embedding slots, and L2 normalization.
nemo_retriever/tests/test_create_local_embedder.py	Factory routing tests cover all four embedder classes including the new VL vLLM class; fixture correctly stubs both `LlamaNemotronEmbedVL1BV2Embedder` and `LlamaNemotronEmbedVL1BV2VLLMEmbedder`.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[embed request] --> B{has endpoint?}
    B -- yes --> C[Remote NIM HTTP]
    B -- no --> D[maybe_inject_local_hf_embedder]
    D --> E{local_ingest_backend}
    E -- vllm --> F[create_local_embedder backend=vllm]
    E -- hf --> G[create_local_embedder backend=hf]
    F --> H{is_vl_embed_model?}
    G --> I{is_vl_embed_model?}
    H -- yes --> J[LlamaNemotronEmbedVL1BV2VLLMEmbedder]
    H -- no --> K[LlamaNemotronEmbed1BV2Embedder]
    I -- yes --> L[LlamaNemotronEmbedVL1BV2Embedder]
    I -- no --> M[LlamaNemotronEmbed1BV2HFEmbedder]
    K --> N["_embed closure: embed(prefixed, prefix='')"]
    M --> O["_embed closure: embed(prefixed, prefix='') ❌ TypeError"]
    J --> P["_embed closure: embed(texts) ✓"]
    L --> Q["_embed closure: embed(texts) ✓"]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/text_embed/processor.py
Line: 124-129

Comment:
**`prefix=""` kwarg crashes HF non-VL ingest backend**

When `local_ingest_backend="hf"`, `create_local_embedder` returns a `LlamaNemotronEmbed1BV2HFEmbedder`. The `_embed` closure then calls `embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="")`, but `LlamaNemotronEmbed1BV2HFEmbedder.embed()` is declared as `def embed(self, texts, *, batch_size=64)` with no `prefix` parameter. Python raises `TypeError: embed() got an unexpected keyword argument 'prefix'` immediately, crashing the ingest pipeline for any user who sets `local_ingest_backend="hf"`.

The `LlamaNemotronEmbed1BV2HFEmbedder.embed()` already prepends the `"passage: "` prefix internally, so the texts should be passed unprefixed and without `prefix=`:

```python
def _embed(texts):
    if ingest_backend == "hf":
        # HF embedder manages its own prefix internally
        vecs = embedder_instance.embed(texts, batch_size=local_batch_size)
    else:
        prefixed = [prefix + t for t in texts] if prefix else texts
        vecs = embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="")
    return vecs.tolist()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_vl_1b_v2_embedder.py
Line: 51-86

Comment:
**Eager model load in `__post_init__` violates `model-lazy-loading` rule**

`LlamaNemotronEmbedVL1BV2Embedder.__post_init__` calls `AutoModel.from_pretrained()` (plus a `.to(dev)` CUDA transfer) unconditionally at construction time. The `model-lazy-loading` rule requires models to load on first use, not at construction. Any code path that constructs this class — e.g., `create_local_embedder(..., backend="hf")` for a VL model inside `gpu_operator.py` — will immediately trigger a full HF download and GPU transfer before any embedding call is made.

Apply the same `_ensure_loaded()` pattern used by `LlamaNemotronEmbed1BV2HFEmbedder`: set `self._model = None` in `__post_init__`, and defer the `AutoModel.from_pretrained` call to a private `_ensure_loaded()` helper invoked in `embed()`, `embed_queries()`, `embed_images()`, and `embed_text_image()`.

**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (38): Last reviewed commit: "Merge branch 'main' into retriever-vllm-..." | Re-trigger Greptile}

The torch 2.10 / vLLM 0.17 bump requires additional work not yet ready. Revert to the upstream baseline (vLLM 0.16.0, torch~=2.9.1, torchvision >=0.24,<0.25, flashinfer 0.6.3). The VLM embedder vLLM code path is preserved but now raises a clear RuntimeError at init time if the installed vLLM is < 0.17.0, since that version is required for VLM pooling support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Plumbs embed_use_vllm through to EmbedParams (ingest) and RecallConfig (query), matching the same flag added to the old inprocess/fused examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Text embedding via vLLM works on 0.16.0; only the VLM embedder requires 0.17.0 for pooling support. Move the version check accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Both nemotron_vlm_captioner and nemotron_parse_v1_2 translate device to CUDA_VISIBLE_DEVICES and call configure_global_hf_cache_base() before constructing LLM(). The vLLM path in both local embedders was silently ignoring these fields. Apply the same pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pattern Separate HF and vLLM embedder classes rather than toggling `use_vllm` on a single class. `create_local_embedder()` dispatches to the appropriate class. `_BatchEmbedGPUActor` simplified to a single `create_local_embedder()` call. VLM vLLM embedder retains the vLLM >= 0.17.0 runtime guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ard HF_HUB_OFFLINE to Ray workers graph_pipeline.py was missing the --embed-local-ingest-backend option that run.py was already generating; wires the flag through to EmbedParams. graph_ingestor.py initialized Ray before executor.py could run, bypassing the HF_HUB_OFFLINE forwarding in executor.py; adds the env var directly to graph_ingestor's ray_env_vars dict. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Charles Blackmon-Luca <charlesb@nvidia.com> Signed-off-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>

HarnessConfig: - reranker, reranker_model_name, rerank_modality, embed_local_ingest_backend, embed_local_query_backend fields + validation + env overrides - fix preset/dataset merge order: dataset now wins over preset as intended - load_nightly_config: expose sweep-level overrides dict harness/run.py: - _build_command: pass reranker flags and embed/query backend to graph_pipeline - _run_single: include new fields in run metadata - _wait_for_gpu_clear: poll nvidia-smi between runs to prevent vLLM CUDA poisoning - execute_runs: global_overrides merged under per-run overrides; GPU drain before first run and between each subsequent run recall/beir.py: - BeirConfig.rerank_modality field plumbed through to Retriever Sweep files (vllm_bo767_sweep + jp20_vl_sweep): 18-run design covering text model and VL model × text/text_image modality × HF/vLLM ingest × HF/vLLM query × reranker on/off. test_configs.yaml updated with all 36 dataset configs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ation The flag does not exist in graph_pipeline.py — rerank modality inherits from embed_modality by default, which is the correct behavior. Passing an unrecognized flag caused all reranked harness runs to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>

huggingface_hub reads HF_HUB_OFFLINE as a module-level constant at import time. Ray workers import the module during startup before runtime_env env_vars are injected, so setting the flag only in runtime_env has no effect on workers. Setting os.environ before ray.init() ensures worker processes inherit the value at the OS level (fork/exec), suppressing model_info() network calls that cause 429 errors when the HF token is absent or rate-limited. Fixed in all four Ray initialization sites: harness/run.py, harness/runner.py, graph/executor.py, graph_ingestor.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>

Removes reranker/backend config fields, _build_command wiring, _wait_for_gpu_clear, sweep YAML files, and test updates added in the previous three commits. These will land separately once the core vLLM embedder work is merged. Retains the os.environ["HF_HUB_OFFLINE"] fix in graph/executor.py and graph_ingestor.py, which are core pipeline files, not harness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>

…embeddings-1

Remove jp20_vl_sweep.yaml and restore test_configs.yaml to upstream/main state. These harness configs will land in a follow-up PR alongside the harness feature work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>

… default to HF Drop the implicit "auto" backend alias in favour of explicit "vllm" / "hf" strings throughout create_local_embedder, create_local_query_embedder, Retriever, and the CLI. Defaults are now vllm for ingest and hf for query, matching the recommended mixed-backend configuration. Add HF text-embed support to create_local_embedder (previously non-VL always used vLLM regardless of backend). This lets gpu_operator.py consolidate to a single create_local_embedder call instead of branching on create_local_query_embedder for the ingest-HF path, and collapses create_local_query_embedder to a thin validation wrapper that delegates entirely to create_local_embedder. Update test_create_local_embedder to cover all four class paths (text-vllm, text-hf, vl-hf, vl-vllm), explicit/default backend selection for both factory functions, invalid backend rejection, and guard the torch-dependent smoke test with pytest.importorskip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ear edge cases Address code-review feedback: lazy-load the HF embedder via _ensure_loaded() so GPU allocation is deferred to the first embed call, add _to_bool() to correctly handle string "false"/"0" from env/task-config dictionaries, and log rather than silently swallow the cuda.empty_cache() exception on error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RecallConfig.local_query_embed_backend defaults to "auto", but Retriever._get_local_embedder only accepts "hf" or "vllm", causing a ValueError at query time with the default config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ults vllm 'auto' was inconsistently defined (vLLM in the CLI, HF in Retriever) and added no real value. Replace with explicit defaults: query embed → 'hf', ingest embed → 'vllm' (already the default in pipeline/__main__.py). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove false claim that VL models always use HF regardless of local_query_embed_backend — the code does not enforce this. Fix stacklevel=2 → 4 in LlamaNemotronEmbed1BV2Embedder.__post_init__ so the DeprecationWarning points to the caller, not the dataclass __init__. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tests Forward normalize and max_length from task kwargs in _BatchEmbedActor so user-configured values are not silently dropped. Add pytest.importorskip guard to TestCreateVllmLlm so it skips cleanly when vLLM is not installed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…LLM ingest / HF recall VL models bypass the local_ingest_backend setting and always load via LlamaNemotronEmbedVL1BV2Embedder (HF). Non-VL models keep their existing defaults: vLLM for ingest, HF for recall. Updated comments and CLI help text to reflect the new behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vllm The VL branch was unconditionally using LlamaNemotronEmbedVL1BV2Embedder (HF) regardless of the backend argument. Now it dispatches to the vllm class by default (matching non-VL behaviour) and falls back to the HF class only when backend="hf" is explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mbedder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

maybe_inject_local_hf_embedder was hardcoding backend="vllm", ignoring EmbedParams.local_ingest_backend. Reads the value from task_config now, falling back to "vllm" if unset. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… torch import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-24T21:26:39Z

+    else:
+
+        def _embed(texts):
+            prefixed = [prefix + t for t in texts] if prefix else texts
+            vecs = embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="")
+            return vecs.tolist()


prefix="" kwarg crashes HF non-VL ingest backend

When local_ingest_backend="hf", create_local_embedder returns a LlamaNemotronEmbed1BV2HFEmbedder. The _embed closure then calls embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix=""), but LlamaNemotronEmbed1BV2HFEmbedder.embed() is declared as def embed(self, texts, *, batch_size=64) with no prefix parameter. Python raises TypeError: embed() got an unexpected keyword argument 'prefix' immediately, crashing the ingest pipeline for any user who sets local_ingest_backend="hf".

The LlamaNemotronEmbed1BV2HFEmbedder.embed() already prepends the "passage: " prefix internally, so the texts should be passed unprefixed and without prefix=:

def _embed(texts): if ingest_backend == "hf": # HF embedder manages its own prefix internally vecs = embedder_instance.embed(texts, batch_size=local_batch_size) else: prefixed = [prefix + t for t in texts] if prefix else texts vecs = embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="") return vecs.tolist()

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/text_embed/processor.py Line: 124-129 Comment: **`prefix=""` kwarg crashes HF non-VL ingest backend** When `local_ingest_backend="hf"`, `create_local_embedder` returns a `LlamaNemotronEmbed1BV2HFEmbedder`. The `_embed` closure then calls `embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="")`, but `LlamaNemotronEmbed1BV2HFEmbedder.embed()` is declared as `def embed(self, texts, *, batch_size=64)` with no `prefix` parameter. Python raises `TypeError: embed() got an unexpected keyword argument 'prefix'` immediately, crashing the ingest pipeline for any user who sets `local_ingest_backend="hf"`. The `LlamaNemotronEmbed1BV2HFEmbedder.embed()` already prepends the `"passage: "` prefix internally, so the texts should be passed unprefixed and without `prefix=`: ```python def _embed(texts): if ingest_backend == "hf": # HF embedder manages its own prefix internally vecs = embedder_instance.embed(texts, batch_size=local_batch_size) else: prefixed = [prefix + t for t in texts] if prefix else texts vecs = embedder_instance.embed(prefixed, batch_size=local_batch_size, prefix="") return vecs.tolist() ``` How can I resolve this? If you propose a fix, please make it concise.

Co-authored-by: polazilber <pzilberman@nvidia.com> Co-authored-by: Yuval Shkolar <yshkolar@nvidia.com>

LlamaNemotronEmbed1BV2HFEmbedder.embed() has no prefix parameter and manages its own prefix internally; passing prefix="" raises TypeError for any user with local_ingest_backend="hf". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

First pass of reduced vLLM embedder

57fe183

charlesbluca requested a review from a team as a code owner March 5, 2026 20:04

charlesbluca requested a review from ChrisJar March 5, 2026 20:04

charlesbluca mentioned this pull request Mar 5, 2026

[DNM] (lib) vLLM-backed embedder experimentation #1469

Closed

4 tasks

charlesbluca added 9 commits March 5, 2026 20:08

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

81b711e

…embeddings-1

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

2f19148

…embeddings-1

Pull vLLM CUDA 13 wheel

ec5a3e3

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

c9d4c42

…embeddings-1

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

a4fb012

…embeddings-1 Made-with: Cursor # Conflicts: # nemo_retriever/src/nemo_retriever/examples/batch_pipeline.py

feat(beir): plumb use_vllm through BeirConfig to Retriever

840a765

- Add BeirConfig.use_vllm (default False), forward as embed_use_vllm - batch_pipeline: pass embed_use_vllm into BeirConfig for BEIR eval - Extend test_evaluate_lancedb_beir to assert Retriever kwargs Made-with: Cursor

Plumb vLLM CLI option into inprocess example

11afd40

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

e5db4ad

…embeddings-1

Linting

804bde6

charlesbluca changed the title ~~(lib) Add support for vLLM-backed embedder~~ (lib) Add support for vLLM-backed text embedder Mar 23, 2026

charlesbluca added 3 commits March 23, 2026 18:21

Allow vLLM to be toggled separately for recall

d00b4b2

Apply ingest/recall vLLM split to fused example

65c224b

Bump vLLM / torch to allow vLLM-backed VL embedder

9915ac1

charlesbluca requested a review from a team as a code owner March 24, 2026 19:40

charlesbluca and others added 3 commits April 7, 2026 18:21

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

c5fbfa5

…embeddings-1

Merge branch 'main' into retriever-vllm-for-embeddings-1

715ecc7

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

charlesbluca and others added 3 commits April 7, 2026 18:29

Add --embed-use-vllm flag to graph_pipeline example

e8ffd24

Plumbs embed_use_vllm through to EmbedParams (ingest) and RecallConfig (query), matching the same flag added to the old inprocess/fused examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move vLLM >=0.17.0 guard to VLM embedder only

2c74315

Text embedding via vLLM works on 0.16.0; only the VLM embedder requires 0.17.0 for pooling support. Move the version check accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_embedder.py Outdated

Comment thread nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_embedder.py Outdated

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_embedder.py Outdated

charlesbluca and others added 7 commits April 22, 2026 14:47

Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…

a277139

…embeddings-1

greptile-apps Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated

charlesbluca and others added 6 commits April 23, 2026 16:42

Merge branch 'main' into retriever-vllm-for-embeddings-1

982c4a0

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread nemo_retriever/tests/test_beir_evaluation.py Outdated

charlesbluca and others added 4 commits April 24, 2026 15:40

test(beir): drop stale 'auto' backend assertion

53f352a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into retriever-vllm-for-embeddings-1

64f29bc

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/model/__init__.py

Comment thread nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_vl_1b_v2_embedder.py Outdated

charlesbluca and others added 2 commits April 24, 2026 19:52

fix(embed): forward gpu_memory_utilization/enforce_eager to VL vLLM e…

556436b

…mbedder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/text_embed/processor.py

charlesbluca and others added 3 commits April 24, 2026 20:22

fix: address code review comments on local_ingest_backend comment and…

cb73da9

… torch import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into retriever-vllm-for-embeddings-1

6618eb4

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

tomer-levin-nv and others added 3 commits April 27, 2026 14:17

Add queries comparison (NVIDIA#1928)

b4eb2b3

Co-authored-by: polazilber <pzilberman@nvidia.com> Co-authored-by: Yuval Shkolar <yshkolar@nvidia.com>

Merge branch 'main' into retriever-vllm-for-embeddings-1

aa863dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(lib) Add support for vLLM-backed text embedder#1494

(lib) Add support for vLLM-backed text embedder#1494
charlesbluca wants to merge 85 commits intoNVIDIA:mainfrom
charlesbluca:retriever-vllm-for-embeddings-1

charlesbluca commented Mar 5, 2026

Uh oh!

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

charlesbluca commented Mar 5, 2026

Description

Checklist

Uh oh!

greptile-apps Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading