Conversation
📝 WalkthroughWalkthroughThis PR makes collection responses conditional: CollectionPublic now carries optional LLM fields or knowledge-base fields, with validators and JSON serialization to include only the relevant subset. A new to_collection_public helper produces the public representation; routes, job payloads, docs, and tests were updated accordingly. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API_Route as API Route
participant DB as Collection Model
participant Helper as to_collection_public
participant Public as CollectionPublic
Client->>API_Route: GET /collections/{id}?include_url=...
API_Route->>DB: fetch collection record
DB-->>API_Route: collection (DB model)
API_Route->>Helper: to_collection_public(collection)
Helper->>Helper: decide Assistant vs Vector Store
Helper->>Public: build CollectionPublic with relevant fields
Public->>Public: validate_service_fields()
Public->>API_Route: CollectionPublic instance
API_Route->>Public: model_dump(mode="json", exclude_none=True)
Public-->>API_Route: serialized JSON (pruned fields)
API_Route-->>Client: HTTP response with conditional fields
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/api/routes/collection_job.py (1)
35-53:⚠️ Potential issue | 🟡 MinorAdd return type annotation for
collection_job_info.The function is missing an explicit return type annotation. Although the FastAPI decorator specifies
response_model=APIResponse[CollectionJobPublic], Python type checkers require explicit->syntax on the function signature to recognize the return type.Proposed fix
def collection_job_info( session: SessionDep, current_user: AuthContextDep, job_id: UUID = FastPath(description="Collection job to retrieve"), -): +) -> APIResponse[CollectionJobPublic]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/collection_job.py` around lines 35 - 53, The function collection_job_info is missing an explicit return type annotation, so add a return type of APIResponse[CollectionJobPublic] to its signature (i.e., change def collection_job_info(...) to include -> APIResponse[CollectionJobPublic]); ensure APIResponse and CollectionJobPublic are imported/available in the module and return values conform to that type (job_out wrapped in the APIResponse shape) so static type checkers recognize the response type.backend/app/api/routes/documents.py (1)
107-122:⚠️ Potential issue | 🟡 MinorAdd return type annotation for
upload_doc.The function lacks an explicit return type annotation. Although the decorator specifies
response_model=APIResponse[DocumentUploadResponse], the coding guidelines require explicit type hints on function signatures.Proposed fix
async def upload_doc( session: SessionDep, current_user: AuthContextDep, src: UploadFile = File(...), target_format: str | None = Form( None, description="Desired output format for the uploaded document", ), transformer: str | None = Form( None, description="Name of the transformer to apply when converting." ), callback_url: str | None = Form(None, description="URL to call to report doc transformation status"), -): +) -> APIResponse[DocumentUploadResponse]:Per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/documents.py` around lines 107 - 122, The upload_doc function is missing an explicit return type annotation; update the signature of upload_doc to include the appropriate return type matching the route decorator (APIResponse[DocumentUploadResponse]) and ensure any required typing imports (e.g., APIResponse, DocumentUploadResponse, Optional from typing if needed) are present; locate the upload_doc definition (parameters: session: SessionDep, current_user: AuthContextDep, src: UploadFile, target_format, transformer, callback_url) and add the return type hint to the function signature so the signature explicitly declares -> APIResponse[DocumentUploadResponse].backend/app/tests/api/routes/collections/test_collection_list.py (1)
82-84:⚠️ Potential issue | 🟡 MinorStale docstring — now describes the wrong fields
The docstring still says "expose the expected LLM fields", but the test body was updated to assert
knowledge_base_*fields and the absence of LLM fields.✏️ Proposed fix
- """ - Ensure that vector-store-style collections created via get_vector_store_collection - appear in the list and expose the expected LLM fields. - """ + """ + Ensure that vector-store-style collections created via get_vector_store_collection + appear in the list and expose knowledge_base_provider / knowledge_base_id fields, + and do NOT expose llm_service_name / llm_service_id. + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/collections/test_collection_list.py` around lines 82 - 84, The test docstring is stale: while the test (in test_collection_list.py) now asserts for knowledge_base_* fields and absence of LLM fields, the docstring still says "expose the expected LLM fields"; update the docstring for the test (the test that uses get_vector_store_collection / collection list assertions) to describe that vector-store-style collections should appear in the list and expose the expected knowledge_base_* fields and explicitly note LLM fields should be absent, so it matches the current assertions.
🧹 Nitpick comments (2)
backend/app/tests/api/routes/collections/test_collection_list.py (1)
105-110: Missing symmetric assertions for the assistant-collection testThe vector-store test now verifies the full conditional-field contract (knowledge_base fields present, LLM fields absent). The
test_list_collections_includes_assistant_collectiontest (lines 37–73) still only checks that the collection ID appears in the list. The symmetric side — thatllm_service_name/llm_service_idare present andknowledge_base_*fields are not — is never asserted. Without this, a regression that accidentally surfaces LLM fields for assistant collections would go undetected.Consider adding the complementary assertions to
test_list_collections_includes_assistant_collection:# in test_list_collections_includes_assistant_collection, after after_ids assertion row = next(c for c in after_payload if c["id"] == str(collection.id)) assert "llm_service_name" in row assert "llm_service_id" in row assert "knowledge_base_provider" not in row assert "knowledge_base_id" not in row🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/collections/test_collection_list.py` around lines 105 - 110, In test_list_collections_includes_assistant_collection add the complementary assertions that validate the LLM-field branch: locate the collection row from after_payload (use the existing after_payload and collection variables, e.g. row = next(c for c in after_payload if c["id"] == str(collection.id))) and assert that "llm_service_name" and "llm_service_id" are present and that "knowledge_base_provider" and "knowledge_base_id" are absent so the test symmetrically verifies the conditional-field contract.backend/app/api/routes/collections.py (1)
67-77: Missing return type annotations on modified route functionsBoth
list_collections(line 67) andcollection_info(line 187) lack return type annotations. As per coding guidelines, all function parameters and return values should carry type hints.✏️ Suggested additions
def list_collections( session: SessionDep, current_user: AuthContextDep, -): +) -> APIResponse[list[CollectionPublic]]:def collection_info( session: SessionDep, current_user: AuthContextDep, collection_id: UUID = FastPath(description="Collection to retrieve"), include_docs: bool = Query(...), include_url: bool = Query(...), limit: int | None = Query(...), -): +) -> APIResponse[CollectionWithDocsPublic]:As per coding guidelines: "Always add type hints to all function parameters and return values in Python code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/collections.py` around lines 67 - 77, Add explicit return type annotations to the two route handlers: change the signatures of list_collections and collection_info to include a return type (e.g. -> APIResponse) so they conform to the project typing rules; update any necessary imports (ensure APIResponse is imported from the module that defines it) and keep existing parameter types (SessionDep, AuthContextDep) intact. Locate the functions by name (list_collections and collection_info) and modify their def lines to add the return type, and add or adjust the import for APIResponse if it’s not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/routes/collections.py`:
- Around line 195-197: The change made include_url: bool = Query(False, ...) is
a breaking API change; revert the default back to True for the include_url
parameter in the collection_info endpoint (parameter name include_url in
function collection_info) to preserve existing behavior and update release
notes/consumers as needed, and add missing return type annotations to the
functions list_collections and collection_info to satisfy the type-hinting
guideline (ensure both their signatures include explicit return types matching
their current returned values).
In `@backend/app/models/collection.py`:
- Around line 236-284: Add type hints to the two methods: change
validate_service_fields to have signature def validate_service_fields(self: Any)
-> Any (or more specific Collection if available) and change _serialize_model to
def _serialize_model(self: Any, serializer: Any, info: Any) -> Any (or ->
dict[str, Any] if you prefer JSON object typing); keep existing decorators
(`@model_validator` and `@model_serializer`) and body logic unchanged—just add the
parameter and return annotations to the validate_service_fields and
_serialize_model methods to satisfy the typing guideline.
In `@backend/app/services/collections/helpers.py`:
- Around line 123-155: The equality check in to_collection_public incorrectly
performs a case-sensitive/whitespace-sensitive comparison against
get_service_name(collection.provider), which can misclassify vector stores;
update the comparison to normalize both sides (e.g., strip() and lower()) and
handle None safely before comparing (use something like
normalized_llm_service_name and normalized_expected_name) so is_vector_store is
computed from the normalized values; keep the rest of to_collection_public logic
unchanged and refer to get_service_name(collection.provider),
collection.llm_service_name, and collection.llm_service_id when implementing the
normalized comparison.
---
Outside diff comments:
In `@backend/app/api/routes/collection_job.py`:
- Around line 35-53: The function collection_job_info is missing an explicit
return type annotation, so add a return type of APIResponse[CollectionJobPublic]
to its signature (i.e., change def collection_job_info(...) to include ->
APIResponse[CollectionJobPublic]); ensure APIResponse and CollectionJobPublic
are imported/available in the module and return values conform to that type
(job_out wrapped in the APIResponse shape) so static type checkers recognize the
response type.
In `@backend/app/api/routes/documents.py`:
- Around line 107-122: The upload_doc function is missing an explicit return
type annotation; update the signature of upload_doc to include the appropriate
return type matching the route decorator (APIResponse[DocumentUploadResponse])
and ensure any required typing imports (e.g., APIResponse,
DocumentUploadResponse, Optional from typing if needed) are present; locate the
upload_doc definition (parameters: session: SessionDep, current_user:
AuthContextDep, src: UploadFile, target_format, transformer, callback_url) and
add the return type hint to the function signature so the signature explicitly
declares -> APIResponse[DocumentUploadResponse].
In `@backend/app/tests/api/routes/collections/test_collection_list.py`:
- Around line 82-84: The test docstring is stale: while the test (in
test_collection_list.py) now asserts for knowledge_base_* fields and absence of
LLM fields, the docstring still says "expose the expected LLM fields"; update
the docstring for the test (the test that uses get_vector_store_collection /
collection list assertions) to describe that vector-store-style collections
should appear in the list and expose the expected knowledge_base_* fields and
explicitly note LLM fields should be absent, so it matches the current
assertions.
---
Nitpick comments:
In `@backend/app/api/routes/collections.py`:
- Around line 67-77: Add explicit return type annotations to the two route
handlers: change the signatures of list_collections and collection_info to
include a return type (e.g. -> APIResponse) so they conform to the project
typing rules; update any necessary imports (ensure APIResponse is imported from
the module that defines it) and keep existing parameter types (SessionDep,
AuthContextDep) intact. Locate the functions by name (list_collections and
collection_info) and modify their def lines to add the return type, and add or
adjust the import for APIResponse if it’s not already present.
In `@backend/app/tests/api/routes/collections/test_collection_list.py`:
- Around line 105-110: In test_list_collections_includes_assistant_collection
add the complementary assertions that validate the LLM-field branch: locate the
collection row from after_payload (use the existing after_payload and collection
variables, e.g. row = next(c for c in after_payload if c["id"] ==
str(collection.id))) and assert that "llm_service_name" and "llm_service_id" are
present and that "knowledge_base_provider" and "knowledge_base_id" are absent so
the test symmetrically verifies the conditional-field contract.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
backend/app/api/docs/collections/info.mdbackend/app/api/docs/collections/job_info.mdbackend/app/api/docs/collections/list.mdbackend/app/api/routes/collection_job.pybackend/app/api/routes/collections.pybackend/app/api/routes/documents.pybackend/app/models/collection.pybackend/app/services/collections/create_collection.pybackend/app/services/collections/helpers.pybackend/app/tests/api/routes/collections/test_collection_info.pybackend/app/tests/api/routes/collections/test_collection_list.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
backend/app/models/collection.py (1)
270-284: LGTM — type annotations resolved; notewhen_used="json"scopeThe previous review's type-hint concern is now addressed. ✓
One thing to be aware of:
when_used="json"means callers using plainmodel_dump()(withoutmode="json") will receive all four optional fields, including the unused pair asNone, unfiltered. This is intentional for the API-response use case (which always uses JSON mode), but any non-API serialization path (e.g., internal logging, caching serialization) would expose the unusedNonefields without pruning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/collection.py` around lines 270 - 284, The serializer currently only runs for JSON mode (model_serializer(..., when_used="json")), so model_dump() (python mode) still returns the unused fields as None; update the serialization so both modes prune unused pairs: either remove the when_used argument to apply the same logic to all modes, or add a second serializer with the same body but model_serializer(mode="wrap", when_used="python") (or change to when_used=None) so the _serialize_model logic (referenced by the method name _serialize_model and the field names llm_service_id, llm_service_name, knowledge_base_id, knowledge_base_provider) runs for non-JSON dumps as well.
🧹 Nitpick comments (2)
backend/app/models/collection.py (1)
226-229: Nit:knowledge_base_providerdescription says "provider name" but stores service namePer
to_collection_public,knowledge_base_provideris populated fromcollection.llm_service_name(the specific LLM service identifier, e.g.vs_openaior similar fromget_service_name), not the provider type (e.g."openai"). The description "Knowledge base provider name" is potentially misleading.✏️ Proposed clarification
- description="Knowledge base provider name when only vector store was created", + description="Knowledge base service name (e.g., 'openai-vs') when only vector store was created",Based on learnings: "treat
provideras the LLM provider name (e.g., 'openai') andllm_service_nameas the specific service from that provider. These fields serve different purposes and should remain non-redundant."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/collection.py` around lines 226 - 229, The Field description for knowledge_base_provider is misleading: it currently says "provider name" but the value is actually populated from collection.llm_service_name (the specific LLM service identifier). Update the description on the knowledge_base_provider Field to state it stores the LLM service name (e.g., 'vs_openai' or other service identifier) and reference its relation to to_collection_public/collection.llm_service_name so callers understand it is the service identifier rather than the provider type.backend/app/services/collections/create_collection.py (1)
81-87: Unnecessary round-trip: serialize to dict → re-validate → serialize again
to_collection_publicalready returns a valid, fully-validatedCollectionPublicinstance. Serializing it to a JSON dict (model_dump(mode="json", exclude_none=True)) only to have Pydantic immediately re-parse and re-validate it (running type coercion andvalidate_service_fieldsa second time) back into aCollectionPublicwhenCollectionJobPublic.model_validateprocesses theupdatearg is wasteful._serialize_modelthen fires a second time during the finalmodel_dump(mode="json")on the API response.Passing the instance directly is cleaner:
♻️ Proposed simplification
- collection_public = to_collection_public(collection) - collection_dict = collection_public.model_dump(mode="json", exclude_none=True) - job_public = CollectionJobPublic.model_validate( collection_job, - update={"collection": collection_dict}, + update={"collection": to_collection_public(collection)}, )Pydantic v2 accepts a model instance directly for a union-typed field (
CollectionPublic | CollectionIDPublic | None) without re-serializing or re-validating, and_serialize_modelfires exactly once during the finalAPIResponse...model_dump(mode="json").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/collections/create_collection.py` around lines 81 - 87, The code currently serializes a validated CollectionPublic instance by calling to_collection_public(...) then model_dump(...) into collection_dict and passes that dict into CollectionJobPublic.model_validate(update={"collection": collection_dict}), causing an unnecessary re-parse/re-validation and duplicated _serialize_model calls; instead pass the CollectionPublic instance directly (collection_public) into the update payload so CollectionJobPublic.model_validate(...) receives the model instance and avoids the extra model_dump/model_validate round-trip (update the usage around to_collection_public, collection_public and CollectionJobPublic.model_validate to use the instance rather than collection_dict).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/app/models/collection.py`:
- Around line 270-284: The serializer currently only runs for JSON mode
(model_serializer(..., when_used="json")), so model_dump() (python mode) still
returns the unused fields as None; update the serialization so both modes prune
unused pairs: either remove the when_used argument to apply the same logic to
all modes, or add a second serializer with the same body but
model_serializer(mode="wrap", when_used="python") (or change to when_used=None)
so the _serialize_model logic (referenced by the method name _serialize_model
and the field names llm_service_id, llm_service_name, knowledge_base_id,
knowledge_base_provider) runs for non-JSON dumps as well.
---
Nitpick comments:
In `@backend/app/models/collection.py`:
- Around line 226-229: The Field description for knowledge_base_provider is
misleading: it currently says "provider name" but the value is actually
populated from collection.llm_service_name (the specific LLM service
identifier). Update the description on the knowledge_base_provider Field to
state it stores the LLM service name (e.g., 'vs_openai' or other service
identifier) and reference its relation to
to_collection_public/collection.llm_service_name so callers understand it is the
service identifier rather than the provider type.
In `@backend/app/services/collections/create_collection.py`:
- Around line 81-87: The code currently serializes a validated CollectionPublic
instance by calling to_collection_public(...) then model_dump(...) into
collection_dict and passes that dict into
CollectionJobPublic.model_validate(update={"collection": collection_dict}),
causing an unnecessary re-parse/re-validation and duplicated _serialize_model
calls; instead pass the CollectionPublic instance directly (collection_public)
into the update payload so CollectionJobPublic.model_validate(...) receives the
model instance and avoids the extra model_dump/model_validate round-trip (update
the usage around to_collection_public, collection_public and
CollectionJobPublic.model_validate to use the instance rather than
collection_dict).
Summary
Target issue is #604
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Important : Since we are pushing these changes to production immediately, changing the parameter name from "llm service id" to "knowledge base id" will be a breaking change for dalgo. This is the reason why we are returning both llm service id or knowledge base id depending on if an assistant was created or vector store was created, respectively.
Although, everything related to assistant creation and management will be completely removed from our code by the end of march, and so the "llm service id" and "llm service name" will permanently become "knowledge base id" and "knowledge base provider" in output parameters as well as db column names. So, the validator, serialiser and "to_collection_public" function will eventually get removed.
Documentation
New Features
Changed Behavior