Skip to content

Updated files per bugs#1622

Merged
kheiss-uwzoo merged 2 commits intoNVIDIA:26.03from
kheiss-uwzoo:kheiss/qa-review4
Apr 22, 2026
Merged

Updated files per bugs#1622
kheiss-uwzoo merged 2 commits intoNVIDIA:26.03from
kheiss-uwzoo:kheiss/qa-review4

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

Overview

This branch applies documentation and copy fixes from QA review: it updates extraction docs so they match current behavior, fix broken or misleading links, correct examples, and align prerequisites with the support matrix.

Changes (11 files in docs/docs/extraction/)

Prerequisites & support matrix

prerequisites.md (Bug 5965601): Align GPU examples with the support matrix. Removed V100 from examples (not in support matrix; 16 GB variant doesn’t meet 24 GB VRAM). GPU line now points to the Support Matrix and uses only supported examples (e.g., A100, A10G, L40S).
support-matrix.md: Fix link text typo: "NVIDIA NVIDIA Riva" → "NVIDIA Riva".
Python API & VLM
python-api-reference.md: Update VLM caption docs to current API and model: default model nvidia/llama-3.1-nemotron-nano-vl-8b-v1 → nvidia/nemotron-nano-12b-v2-vl; replace caption_prompt/caption_system_prompt with prompt/reasoning (bool).
v2-api-guide.md: Clarify that the Python API ingest() returns a list of document results (not a dict). Add note on Python client vs HTTP API and update FAQ about detecting PDF splitting when using the Python client.
vlm-embed.md: Fix overview link from external Perplexity URL to relative overview.md.

Quickstart

quickstart-guide.md: Switch instructions from Conda to uv/venv; add "Step 2: Install the Client" and a docker ps example; correct MIG example to use nvidia/nemotron-page-elements-v3; update tip to refer to nv-ingest-dev instead of nemo_retriever.
quickstart-library-mode.md: Use VLM model nvidia/nemotron-nano-12b-v2-vl in the example; fix run_pipeline parameter table (e.g. pipeline_config, run_in_subprocess, block marked as not required where applicable); correct return type for subprocess case to RayPipelineSubprocessInterface.

Release notes & references

releasenotes-nv-ingest.md: Fix 24.12.1 and 24.12.0 doc links so they point to 24.12.1 and 24.12.0 paths instead of 25.3.0.
Telemetry, troubleshooting, benchmarking
telemetry.md: Correct Prometheus URLs (remove erroneous /zipkin/ path).
troubleshoot.md: Fix ulimit examples: use 10000 instead of 10,000 (comma invalid in shell).
benchmarking.md: Update example config to current behavior (e.g. api_version: v2, compose.profiles, sparse: false, extract_infographics: true for bo20); fix harness paths from nemo_retriever_harness to nv_ingest_harness; align YAML and dataset table with current defaults.

Testing

Doc-only changes; no code or test changes in this commit.
Verified that GPU examples in prerequisites match the support matrix and that corrected links and commands are consistent with the rest of the docs.
Related

@kheiss-uwzoo kheiss-uwzoo requested a review from a team as a code owner March 13, 2026 22:36
@kheiss-uwzoo kheiss-uwzoo requested review from jdye64, jioffe502 and sosahi and removed request for a team March 13, 2026 22:36
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Mar 13, 2026
@kheiss-uwzoo kheiss-uwzoo merged commit a6b790d into NVIDIA:26.03 Apr 22, 2026
3 of 5 checks passed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR applies documentation fixes to the extraction section: updating GPU examples in prerequisites, clarifying the Python vs HTTP API return types, correcting VLM caption parameters, expanding library-mode quickstart examples, and fixing quickstart guide installation steps.

  • quickstart-library-mode.md: Both new code examples import NemoRetrieverClient but instantiate the undefined NvIngestClient — a NameError for any user who copies the code.
  • python-api-reference.md: The caption example chains .ingest() inside the builder and assigns the list return value to a variable named ingestor, which will cause AttributeError if used as an Ingestor in follow-on calls.
  • quickstart-guide.md: A new ## Step 2 H2 was inserted mid-way through an existing lettered ordered list, breaking rendering, leaking the old docker ps table outside its code fence, duplicating the activation tip, and presenting conflicting package versions (26.1.2 vs 26.3.0-RC4).

Confidence Score: 3/5

Not safe to merge — three P1 issues cause NameErrors in runnable code examples and broken page rendering in the quickstart guide.

Two separate code examples throw NameError on execution due to undefined NvIngestClient; the caption example misnames the result variable causing AttributeError in follow-on usage; and the quickstart guide has structural markdown breakage presenting conflicting instructions to users on the primary documentation path.

quickstart-library-mode.md (lines 25, 309), python-api-reference.md (lines 532–541), and quickstart-guide.md (lines 95–146) all require fixes before merge.

Important Files Changed

Filename Overview
docs/docs/extraction/quickstart-library-mode.md Major expansion of library-mode examples, but both code examples use NvIngestClient (undefined) instead of the imported NemoRetrieverClient, causing a NameError for any user who runs them.
docs/docs/extraction/quickstart-guide.md New "Step 2" H2 inserted mid-list breaks ordered-list structure; leaves old docker ps table outside a code fence, duplicates the tip text and docker exec steps, and presents conflicting package versions (26.1.2 vs 26.3.0-RC4).
docs/docs/extraction/python-api-reference.md New "Caption Images" section added with correct parameter docs, but the code example chains .ingest() inside the builder and assigns the list result to ingestor, making the variable name misleading and likely to cause AttributeError in follow-on usage.
docs/docs/extraction/prerequisites.md Minor GPU example simplification — removed H100/H200 NVL to align with support matrix. Clean, correct change.
docs/docs/extraction/v2-api-guide.md Adds useful clarification note distinguishing Python client vs HTTP API return shapes, and corrects the FAQ answer about detecting PDF splitting. Clean, accurate additions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User follows docs] --> B{Which quickstart?}
    B --> |Service mode| C[quickstart-guide.md]
    B --> |Library mode| D[quickstart-library-mode.md]
    B --> |Python API| E[python-api-reference.md]

    C --> C1[Step 1: Deploy services via Docker Compose]
    C1 --> C2["## Step 2 new — install 26.3.0-RC4 breaks list structure"]
    C2 --> C3["Step i — install 26.1.2 version conflict and duplicate tip"]
    C3 --> C4[Step j — docker exec]

    D --> D1[run_pipeline in subprocess]
    D1 --> D2["NvIngestClient not imported — NameError"]
    D2 --> D3[Ingestor → ingest]

    E --> E1[Ingestor builder chain]
    E1 --> E2[".caption().ingest() returns list stored as ingestor"]
    E2 --> E3["Follow-on ingestor.method() raises AttributeError"]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 25

Comment:
**`NvIngestClient` not imported — NameError on execution**

Both code examples import `NemoRetrieverClient` but then instantiate `NvIngestClient`, which is never imported. Any user who runs this example will immediately hit `NameError: name 'NvIngestClient' is not defined`. The same issue is present at line 309 in the `launch_libmode_and_run_ingestor.py` example. Replace `NvIngestClient` with `NemoRetrieverClient` in both locations.

```suggestion
    client = NemoRetrieverClient(
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 309

Comment:
**`NvIngestClient` not imported in second example — NameError on execution**

`launch_libmode_and_run_ingestor.py` imports `NemoRetrieverClient` (line 292) but instantiates `NvIngestClient` here, which is undefined.

```suggestion
    client = NemoRetrieverClient(
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/docs/extraction/python-api-reference.md
Line: 532-541

Comment:
**`.ingest()` chained inside builder — result is a list, not an `Ingestor`**

The example chains `.ingest()` inside the parenthesized builder expression and assigns the return value to `ingestor`. Because `ingest()` returns a list of document results, `ingestor` will hold a list, not an `Ingestor` object. Users who follow this pattern and then call `ingestor.embed()` or any other method will get `AttributeError`. Separate the `ingest()` call and store its result in a distinct variable.

```suggestion
ingestor = (
    Ingestor()
    .files("path/to/doc-with-images.pdf")
    .extract(extract_images=True)
    .caption(
        prompt="Caption the content of this image:",
        reasoning=True,  # or False
    )
)
results = ingestor.ingest()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-guide.md
Line: 95-146

Comment:
**New "Step 2" section inserted mid-list, causing structural and rendering breakage**

An H2 heading (`## Step 2: Install the Client`) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old `docker ps` table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a `!!! tip` at line 129); step i installs `26.1.2` while Step 2 installs `26.3.0-RC4`, presenting conflicting instructions; and the new `docker exec` block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch '26.03' into kheiss/qa-revi..." | Re-trigger Greptile

# Start the pipeline subprocess for library mode
run_pipeline(block=False, disable_dynamic_scaling=True, run_in_subprocess=True)

client = NvIngestClient(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 NvIngestClient not imported — NameError on execution

Both code examples import NemoRetrieverClient but then instantiate NvIngestClient, which is never imported. Any user who runs this example will immediately hit NameError: name 'NvIngestClient' is not defined. The same issue is present at line 309 in the launch_libmode_and_run_ingestor.py example. Replace NvIngestClient with NemoRetrieverClient in both locations.

Suggested change
client = NvIngestClient(
client = NemoRetrieverClient(
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 25

Comment:
**`NvIngestClient` not imported — NameError on execution**

Both code examples import `NemoRetrieverClient` but then instantiate `NvIngestClient`, which is never imported. Any user who runs this example will immediately hit `NameError: name 'NvIngestClient' is not defined`. The same issue is present at line 309 in the `launch_libmode_and_run_ingestor.py` example. Replace `NvIngestClient` with `NemoRetrieverClient` in both locations.

```suggestion
    client = NemoRetrieverClient(
```

How can I resolve this? If you propose a fix, please make it concise.

Set up and run the ingestion process to send traffic against the pipeline.
"""
logger.info("Setting up Ingestor client...")
client = NvIngestClient(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 NvIngestClient not imported in second example — NameError on execution

launch_libmode_and_run_ingestor.py imports NemoRetrieverClient (line 292) but instantiates NvIngestClient here, which is undefined.

Suggested change
client = NvIngestClient(
client = NemoRetrieverClient(
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 309

Comment:
**`NvIngestClient` not imported in second example — NameError on execution**

`launch_libmode_and_run_ingestor.py` imports `NemoRetrieverClient` (line 292) but instantiates `NvIngestClient` here, which is undefined.

```suggestion
    client = NemoRetrieverClient(
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +532 to +541
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
.ingest()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 .ingest() chained inside builder — result is a list, not an Ingestor

The example chains .ingest() inside the parenthesized builder expression and assigns the return value to ingestor. Because ingest() returns a list of document results, ingestor will hold a list, not an Ingestor object. Users who follow this pattern and then call ingestor.embed() or any other method will get AttributeError. Separate the ingest() call and store its result in a distinct variable.

Suggested change
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
.ingest()
)
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
)
results = ingestor.ingest()
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/python-api-reference.md
Line: 532-541

Comment:
**`.ingest()` chained inside builder — result is a list, not an `Ingestor`**

The example chains `.ingest()` inside the parenthesized builder expression and assigns the return value to `ingestor`. Because `ingest()` returns a list of document results, `ingestor` will hold a list, not an `Ingestor` object. Users who follow this pattern and then call `ingestor.embed()` or any other method will get `AttributeError`. Separate the `ingest()` call and store its result in a distinct variable.

```suggestion
ingestor = (
    Ingestor()
    .files("path/to/doc-with-images.pdf")
    .extract(extract_images=True)
    .caption(
        prompt="Caption the content of this image:",
        reasoning=True,  # or False
    )
)
results = ingestor.ingest()
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 95 to 146
@@ -103,6 +119,7 @@ h. Run the command `docker ps`. You should see output similar to the following.

i. To run the NeMo Retriever Library Python client from your host machine, Python 3.12 or later is required. Create a virtual environment and install the client packages:

To confirm that you have activated your virtual environment, run `which pip` and `which python`, and confirm that you see `nv-ingest-dev` in the result. You can do this before any pip or python command that you run.
```shell
uv venv --python 3.12 nv-ingest-dev
source nv-ingest-dev/bin/activate
@@ -117,6 +134,15 @@ i. To run the NeMo Retriever Library Python client from your host machine, Pytho

Interaction from the host requires the appropriate port to be exposed from the `nv-ingest` runtime container, as defined in the `docker-compose.yaml` file. If you prefer, you can disable this port and interact directly from within the container.

```bash
docker exec -it nemo-retriever-ms-runtime-1 bash
```
This command opens a shell in the `/workspace` directory, where the `DATASET_ROOT` from your `.env` file is mounted at `./data`. The pre-configured Python environment in the container includes all necessary Python client libraries. You should see a prompt similar to the following.

```bash
root@your-computer-name:/workspace#
```
From this prompt, you can run the `nemo-retriever` CLI and Python examples.
j. To work inside the container, run the following code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 New "Step 2" section inserted mid-list, causing structural and rendering breakage

An H2 heading (## Step 2: Install the Client) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old docker ps table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a !!! tip at line 129); step i installs 26.1.2 while Step 2 installs 26.3.0-RC4, presenting conflicting instructions; and the new docker exec block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-guide.md
Line: 95-146

Comment:
**New "Step 2" section inserted mid-list, causing structural and rendering breakage**

An H2 heading (`## Step 2: Install the Client`) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old `docker ps` table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a `!!! tip` at line 129); step i installs `26.1.2` while Step 2 installs `26.3.0-RC4`, presenting conflicting instructions; and the new `docker exec` block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant