Conversation
Greptile SummaryThis PR applies documentation fixes to the extraction section: updating GPU examples in prerequisites, clarifying the Python vs HTTP API return types, correcting VLM caption parameters, expanding library-mode quickstart examples, and fixing quickstart guide installation steps.
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/quickstart-library-mode.md | Major expansion of library-mode examples, but both code examples use NvIngestClient (undefined) instead of the imported NemoRetrieverClient, causing a NameError for any user who runs them. |
| docs/docs/extraction/quickstart-guide.md | New "Step 2" H2 inserted mid-list breaks ordered-list structure; leaves old docker ps table outside a code fence, duplicates the tip text and docker exec steps, and presents conflicting package versions (26.1.2 vs 26.3.0-RC4). |
| docs/docs/extraction/python-api-reference.md | New "Caption Images" section added with correct parameter docs, but the code example chains .ingest() inside the builder and assigns the list result to ingestor, making the variable name misleading and likely to cause AttributeError in follow-on usage. |
| docs/docs/extraction/prerequisites.md | Minor GPU example simplification — removed H100/H200 NVL to align with support matrix. Clean, correct change. |
| docs/docs/extraction/v2-api-guide.md | Adds useful clarification note distinguishing Python client vs HTTP API return shapes, and corrects the FAQ answer about detecting PDF splitting. Clean, accurate additions. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User follows docs] --> B{Which quickstart?}
B --> |Service mode| C[quickstart-guide.md]
B --> |Library mode| D[quickstart-library-mode.md]
B --> |Python API| E[python-api-reference.md]
C --> C1[Step 1: Deploy services via Docker Compose]
C1 --> C2["## Step 2 new — install 26.3.0-RC4 breaks list structure"]
C2 --> C3["Step i — install 26.1.2 version conflict and duplicate tip"]
C3 --> C4[Step j — docker exec]
D --> D1[run_pipeline in subprocess]
D1 --> D2["NvIngestClient not imported — NameError"]
D2 --> D3[Ingestor → ingest]
E --> E1[Ingestor builder chain]
E1 --> E2[".caption().ingest() returns list stored as ingestor"]
E2 --> E3["Follow-on ingestor.method() raises AttributeError"]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 25
Comment:
**`NvIngestClient` not imported — NameError on execution**
Both code examples import `NemoRetrieverClient` but then instantiate `NvIngestClient`, which is never imported. Any user who runs this example will immediately hit `NameError: name 'NvIngestClient' is not defined`. The same issue is present at line 309 in the `launch_libmode_and_run_ingestor.py` example. Replace `NvIngestClient` with `NemoRetrieverClient` in both locations.
```suggestion
client = NemoRetrieverClient(
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 309
Comment:
**`NvIngestClient` not imported in second example — NameError on execution**
`launch_libmode_and_run_ingestor.py` imports `NemoRetrieverClient` (line 292) but instantiates `NvIngestClient` here, which is undefined.
```suggestion
client = NemoRetrieverClient(
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/docs/extraction/python-api-reference.md
Line: 532-541
Comment:
**`.ingest()` chained inside builder — result is a list, not an `Ingestor`**
The example chains `.ingest()` inside the parenthesized builder expression and assigns the return value to `ingestor`. Because `ingest()` returns a list of document results, `ingestor` will hold a list, not an `Ingestor` object. Users who follow this pattern and then call `ingestor.embed()` or any other method will get `AttributeError`. Separate the `ingest()` call and store its result in a distinct variable.
```suggestion
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
)
results = ingestor.ingest()
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-guide.md
Line: 95-146
Comment:
**New "Step 2" section inserted mid-list, causing structural and rendering breakage**
An H2 heading (`## Step 2: Install the Client`) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old `docker ps` table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a `!!! tip` at line 129); step i installs `26.1.2` while Step 2 installs `26.3.0-RC4`, presenting conflicting instructions; and the new `docker exec` block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch '26.03' into kheiss/qa-revi..." | Re-trigger Greptile
| # Start the pipeline subprocess for library mode | ||
| run_pipeline(block=False, disable_dynamic_scaling=True, run_in_subprocess=True) | ||
|
|
||
| client = NvIngestClient( |
There was a problem hiding this comment.
NvIngestClient not imported — NameError on execution
Both code examples import NemoRetrieverClient but then instantiate NvIngestClient, which is never imported. Any user who runs this example will immediately hit NameError: name 'NvIngestClient' is not defined. The same issue is present at line 309 in the launch_libmode_and_run_ingestor.py example. Replace NvIngestClient with NemoRetrieverClient in both locations.
| client = NvIngestClient( | |
| client = NemoRetrieverClient( |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 25
Comment:
**`NvIngestClient` not imported — NameError on execution**
Both code examples import `NemoRetrieverClient` but then instantiate `NvIngestClient`, which is never imported. Any user who runs this example will immediately hit `NameError: name 'NvIngestClient' is not defined`. The same issue is present at line 309 in the `launch_libmode_and_run_ingestor.py` example. Replace `NvIngestClient` with `NemoRetrieverClient` in both locations.
```suggestion
client = NemoRetrieverClient(
```
How can I resolve this? If you propose a fix, please make it concise.| Set up and run the ingestion process to send traffic against the pipeline. | ||
| """ | ||
| logger.info("Setting up Ingestor client...") | ||
| client = NvIngestClient( |
There was a problem hiding this comment.
NvIngestClient not imported in second example — NameError on execution
launch_libmode_and_run_ingestor.py imports NemoRetrieverClient (line 292) but instantiates NvIngestClient here, which is undefined.
| client = NvIngestClient( | |
| client = NemoRetrieverClient( |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-library-mode.md
Line: 309
Comment:
**`NvIngestClient` not imported in second example — NameError on execution**
`launch_libmode_and_run_ingestor.py` imports `NemoRetrieverClient` (line 292) but instantiates `NvIngestClient` here, which is undefined.
```suggestion
client = NemoRetrieverClient(
```
How can I resolve this? If you propose a fix, please make it concise.| ingestor = ( | ||
| Ingestor() | ||
| .files("path/to/doc-with-images.pdf") | ||
| .extract(extract_images=True) | ||
| .caption( | ||
| prompt="Caption the content of this image:", | ||
| reasoning=True, # or False | ||
| ) | ||
| .ingest() | ||
| ) |
There was a problem hiding this comment.
.ingest() chained inside builder — result is a list, not an Ingestor
The example chains .ingest() inside the parenthesized builder expression and assigns the return value to ingestor. Because ingest() returns a list of document results, ingestor will hold a list, not an Ingestor object. Users who follow this pattern and then call ingestor.embed() or any other method will get AttributeError. Separate the ingest() call and store its result in a distinct variable.
| ingestor = ( | |
| Ingestor() | |
| .files("path/to/doc-with-images.pdf") | |
| .extract(extract_images=True) | |
| .caption( | |
| prompt="Caption the content of this image:", | |
| reasoning=True, # or False | |
| ) | |
| .ingest() | |
| ) | |
| ingestor = ( | |
| Ingestor() | |
| .files("path/to/doc-with-images.pdf") | |
| .extract(extract_images=True) | |
| .caption( | |
| prompt="Caption the content of this image:", | |
| reasoning=True, # or False | |
| ) | |
| ) | |
| results = ingestor.ingest() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/python-api-reference.md
Line: 532-541
Comment:
**`.ingest()` chained inside builder — result is a list, not an `Ingestor`**
The example chains `.ingest()` inside the parenthesized builder expression and assigns the return value to `ingestor`. Because `ingest()` returns a list of document results, `ingestor` will hold a list, not an `Ingestor` object. Users who follow this pattern and then call `ingestor.embed()` or any other method will get `AttributeError`. Separate the `ingest()` call and store its result in a distinct variable.
```suggestion
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
)
results = ingestor.ingest()
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -103,6 +119,7 @@ h. Run the command `docker ps`. You should see output similar to the following. | |||
|
|
|||
| i. To run the NeMo Retriever Library Python client from your host machine, Python 3.12 or later is required. Create a virtual environment and install the client packages: | |||
|
|
|||
| To confirm that you have activated your virtual environment, run `which pip` and `which python`, and confirm that you see `nv-ingest-dev` in the result. You can do this before any pip or python command that you run. | |||
| ```shell | |||
| uv venv --python 3.12 nv-ingest-dev | |||
| source nv-ingest-dev/bin/activate | |||
| @@ -117,6 +134,15 @@ i. To run the NeMo Retriever Library Python client from your host machine, Pytho | |||
|
|
|||
| Interaction from the host requires the appropriate port to be exposed from the `nv-ingest` runtime container, as defined in the `docker-compose.yaml` file. If you prefer, you can disable this port and interact directly from within the container. | |||
|
|
|||
| ```bash | |||
| docker exec -it nemo-retriever-ms-runtime-1 bash | |||
| ``` | |||
| This command opens a shell in the `/workspace` directory, where the `DATASET_ROOT` from your `.env` file is mounted at `./data`. The pre-configured Python environment in the container includes all necessary Python client libraries. You should see a prompt similar to the following. | |||
|
|
|||
| ```bash | |||
| root@your-computer-name:/workspace# | |||
| ``` | |||
| From this prompt, you can run the `nemo-retriever` CLI and Python examples. | |||
| j. To work inside the container, run the following code. | |||
There was a problem hiding this comment.
New "Step 2" section inserted mid-list, causing structural and rendering breakage
An H2 heading (## Step 2: Install the Client) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old docker ps table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a !!! tip at line 129); step i installs 26.1.2 while Step 2 installs 26.3.0-RC4, presenting conflicting instructions; and the new docker exec block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-guide.md
Line: 95-146
Comment:
**New "Step 2" section inserted mid-list, causing structural and rendering breakage**
An H2 heading (`## Step 2: Install the Client`) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old `docker ps` table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a `!!! tip` at line 129); step i installs `26.1.2` while Step 2 installs `26.3.0-RC4`, presenting conflicting instructions; and the new `docker exec` block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.
How can I resolve this? If you propose a fix, please make it concise.
Overview
This branch applies documentation and copy fixes from QA review: it updates extraction docs so they match current behavior, fix broken or misleading links, correct examples, and align prerequisites with the support matrix.
Changes (11 files in docs/docs/extraction/)
Prerequisites & support matrix
prerequisites.md (Bug 5965601): Align GPU examples with the support matrix. Removed V100 from examples (not in support matrix; 16 GB variant doesn’t meet 24 GB VRAM). GPU line now points to the Support Matrix and uses only supported examples (e.g., A100, A10G, L40S).
support-matrix.md: Fix link text typo: "NVIDIA NVIDIA Riva" → "NVIDIA Riva".
Python API & VLM
python-api-reference.md: Update VLM caption docs to current API and model: default model nvidia/llama-3.1-nemotron-nano-vl-8b-v1 → nvidia/nemotron-nano-12b-v2-vl; replace caption_prompt/caption_system_prompt with prompt/reasoning (bool).
v2-api-guide.md: Clarify that the Python API ingest() returns a list of document results (not a dict). Add note on Python client vs HTTP API and update FAQ about detecting PDF splitting when using the Python client.
vlm-embed.md: Fix overview link from external Perplexity URL to relative overview.md.
Quickstart
quickstart-guide.md: Switch instructions from Conda to uv/venv; add "Step 2: Install the Client" and a docker ps example; correct MIG example to use nvidia/nemotron-page-elements-v3; update tip to refer to nv-ingest-dev instead of nemo_retriever.
quickstart-library-mode.md: Use VLM model nvidia/nemotron-nano-12b-v2-vl in the example; fix run_pipeline parameter table (e.g. pipeline_config, run_in_subprocess, block marked as not required where applicable); correct return type for subprocess case to RayPipelineSubprocessInterface.
Release notes & references
releasenotes-nv-ingest.md: Fix 24.12.1 and 24.12.0 doc links so they point to 24.12.1 and 24.12.0 paths instead of 25.3.0.
Telemetry, troubleshooting, benchmarking
telemetry.md: Correct Prometheus URLs (remove erroneous /zipkin/ path).
troubleshoot.md: Fix ulimit examples: use 10000 instead of 10,000 (comma invalid in shell).
benchmarking.md: Update example config to current behavior (e.g. api_version: v2, compose.profiles, sparse: false, extract_infographics: true for bo20); fix harness paths from nemo_retriever_harness to nv_ingest_harness; align YAML and dataset table with current defaults.
Testing
Doc-only changes; no code or test changes in this commit.
Verified that GPU examples in prerequisites match the support matrix and that corrected links and commands are consistent with the rest of the docs.
Related