Conversation
📝 WalkthroughWalkthroughRefactors STT evaluation to enqueue batch submission as a Celery low-priority job (traceable via correlation_id) instead of synchronous submission; removes per-sample result CRUD and pending-run helpers; adds a Celery-executable execute_batch_submission service; shifts to concurrent signed-URL generation and bulk-insert result persistence. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API Route
participant Queue as Celery Queue
participant Worker as Batch Worker
participant DB as Database
participant Service as Batch Service
participant Provider as External Provider
Client->>API: POST start STT evaluation
API->>DB: create STT run (status: pending)
DB-->>API: run created
API->>Queue: enqueue start_low_priority_job(job_id, run_id, dataset_id, org_id, trace_id)
Queue-->>API: returns task_id
API-->>Client: 202 Accepted (run_id, status: pending, task_id)
Queue->>Worker: deliver task (execute_batch_submission)
Worker->>DB: fetch run by id/org/project
Worker->>DB: fetch samples (limit = run.total_items)
Worker->>Service: start_stt_evaluation_batch(run, samples)
Service->>Service: generate signed URLs concurrently (ThreadPoolExecutor)
Service->>Provider: submit batch job
Provider-->>Service: batch response (keys/status)
Service->>DB: bulk_insert STT results (chunked)
Service->>DB: update run status (submitted/processing/failed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/crud/stt_evaluations/cron.py (2)
118-118:⚠️ Potential issue | 🟡 MinorDead code path:
"processed"action is never returned bypoll_stt_run.
poll_stt_runreturns action values"completed","failed", or"no_change"— never"processed". The"processed"check here is unreachable, which means the intent might be wrong or this is leftover from a prior version.Suggested fix
- if result["action"] in ("completed", "processed"): + if result["action"] == "completed":
354-359: 🛠️ Refactor suggestion | 🟠 Major
batch_jobparameter typed asAny— should beBatchJob.The coding guidelines require type hints on all function parameters.
batch_jobis used as aBatchJobthroughout (accessing.id,.config,.provider_batch_id), so the type annotation should reflect that.Suggested fix
async def process_completed_stt_batch( session: Session, run: EvaluationRun, - batch_job: Any, + batch_job: BatchJob, batch_provider: GeminiBatchProvider, ) -> None:As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".
🤖 Fix all issues with AI agents
In `@backend/app/api/routes/stt_evaluations/evaluation.py`:
- Around line 108-111: The HTTPException currently returns internal exception
text via detail=f"Failed to queue batch submission: {e}"; change this to a
generic message (e.g. detail="Failed to queue batch submission") so internal
error details are not leaked to clients, keep the status_code=500, and ensure
the original exception is still logged (the existing logger call earlier in the
surrounding function in evaluation.py should remain unchanged); locate the raise
HTTPException(...) in the batch submission handler in evaluation.py and remove
the interpolated exception from the detail string.
In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 394-404: In process_completed_stt_batch's loop over
batch_responses replace the direct indexing response["custom_id"] with
response.get("custom_id", None) so a missing key yields None; keep the existing
try/except around int(raw_sample_id) (which will catch the None as TypeError)
and ensure you still log the same warning with batch_job.id and the
raw_sample_id value, increment failure_count and continue on error; reference
variables/functions: batch_responses, response, raw_sample_id, stt_sample_id,
batch_job.id, failure_count, and the surrounding for loop in cron.py.
In `@backend/app/services/stt_evaluations/batch_job.py`:
- Line 40: The direct conversion run_id = int(job_id) can raise ValueError for
non-numeric job_id; wrap the conversion in a try/except ValueError around the
run_id = int(job_id) statement, validate job_id first if you prefer, and in the
except block log the error (including the offending job_id), update the run/task
status to a failed/error state and exit/return the Celery task early so the run
record is updated instead of letting the task crash unhandled.
🧹 Nitpick comments (6)
backend/app/crud/stt_evaluations/dataset.py (1)
298-299: Misleading comment: limit is still applied, only the default was removed.The comment says "removing limit" but the query still applies
.limit(limit)on line 324. Consider rewording to clarify that the default was removed to force callers to be explicit about how many samples they want.Suggested wording
- # removing limit for now since we need all samples for batch job, can add pagination later if needed - limit: int, + limit: int, # no default; callers must specify explicitlybackend/app/crud/stt_evaluations/batch.py (1)
107-107: Consider makingmax_workersconfigurable.The hardcoded
max_workers=10is reasonable for typical workloads, but for large datasets with hundreds of samples, tuning may be needed. A constant or parameter would make this easier to adjust.backend/app/services/stt_evaluations/batch_job.py (1)
15-23: Add type hint fortask_instanceand return type.
task_instancelacks a type annotation. Per coding guidelines, all function parameters and return values should have type hints. Also, the return typedictcould be narrowed.Suggested fix
+from typing import Any +from celery import Task + def execute_batch_submission( project_id: int, job_id: str, task_id: str, - task_instance, + task_instance: Task, organization_id: int, dataset_id: int, **kwargs, -) -> dict: +) -> dict[str, Any]:As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".
backend/app/tests/api/routes/test_stt_evaluation.py (1)
545-585: Test correctly updated for Celery offload flow.The patch target, mock return value, and assertions align well with the route changes.
Consider using
assert_called_once_with(...)to verify the correctfunction_path,job_id,organization_id, anddataset_idare passed tostart_low_priority_job, which would catch wiring bugs.backend/app/crud/stt_evaluations/cron.py (2)
371-374: Log format inconsistency: mixed=and:separators.Lines 372–373 and 443–446 use
key=valueformat (e.g.,run_id={run.id}), while other functions in this file usekey: {value}format (e.g., Line 128run_id: {run.id}). Consider unifying to one style for consistent structured log parsing.Also applies to: 443-446
427-433: Use modern SQLAlchemy 2.0session.execute(insert(...))instead of legacybulk_insert_mappings().
Session.bulk_insert_mappings()is a legacy feature deprecated in SQLAlchemy 2.0 in favor ofsession.execute(insert(Model), list_of_dicts), which also provides better performance through insertmanyvalues batching. Additionally, if an exception occurs after chunks are flushed but beforesession.commit(), the partially flushed data remains in the session—the exception handler currently re-raises without explicit rollback, delegating recovery to the caller.Replace with:
from sqlalchemy import insert # Bulk insert in batches of 200 insert_batch_size = 200 for i in range(0, len(stt_result_rows), insert_batch_size): chunk = stt_result_rows[i : i + insert_batch_size] session.execute(insert(STTResult), chunk) if stt_result_rows: session.commit()
| raise HTTPException( | ||
| status_code=500, | ||
| detail=f"Failed to queue batch submission: {e}", | ||
| ) |
There was a problem hiding this comment.
Avoid leaking internal exception details in the HTTP response.
detail=f"Failed to queue batch submission: {e}" exposes internal error information to the API client. Use a generic message instead; the detailed error is already logged on line 99.
Suggested fix
raise HTTPException(
status_code=500,
- detail=f"Failed to queue batch submission: {e}",
+ detail="Failed to start evaluation. Please try again later.",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise HTTPException( | |
| status_code=500, | |
| detail=f"Failed to queue batch submission: {e}", | |
| ) | |
| raise HTTPException( | |
| status_code=500, | |
| detail="Failed to start evaluation. Please try again later.", | |
| ) |
🤖 Prompt for AI Agents
In `@backend/app/api/routes/stt_evaluations/evaluation.py` around lines 108 - 111,
The HTTPException currently returns internal exception text via detail=f"Failed
to queue batch submission: {e}"; change this to a generic message (e.g.
detail="Failed to queue batch submission") so internal error details are not
leaked to clients, keep the status_code=500, and ensure the original exception
is still logged (the existing logger call earlier in the surrounding function in
evaluation.py should remain unchanged); locate the raise HTTPException(...) in
the batch submission handler in evaluation.py and remove the interpolated
exception from the detail string.
| for response in batch_responses: | ||
| raw_sample_id = response["custom_id"] | ||
| try: | ||
| sample_id = int(custom_id) | ||
| stt_sample_id = int(raw_sample_id) | ||
| except (ValueError, TypeError): | ||
| logger.warning( | ||
| f"[process_completed_stt_batch] Invalid custom_id | " | ||
| f"batch_job_id={batch_job.id}, custom_id={custom_id}" | ||
| f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}" | ||
| ) | ||
| failed_count += 1 | ||
| failure_count += 1 | ||
| continue |
There was a problem hiding this comment.
response["custom_id"] will raise KeyError if the key is missing.
If a batch response lacks "custom_id", this line throws an unhandled KeyError that propagates up and aborts processing of the entire batch. Use .get() with a fallback and handle the missing-key case alongside the existing ValueError/TypeError guard.
Suggested fix
for response in batch_responses:
- raw_sample_id = response["custom_id"]
+ raw_sample_id = response.get("custom_id")
try:
stt_sample_id = int(raw_sample_id)
except (ValueError, TypeError):
logger.warning(
f"[process_completed_stt_batch] Invalid custom_id | "
f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}"
)
failure_count += 1
continueWith this change, a None value from .get() will be caught by the TypeError handler below, logging a warning and continuing gracefully.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for response in batch_responses: | |
| raw_sample_id = response["custom_id"] | |
| try: | |
| sample_id = int(custom_id) | |
| stt_sample_id = int(raw_sample_id) | |
| except (ValueError, TypeError): | |
| logger.warning( | |
| f"[process_completed_stt_batch] Invalid custom_id | " | |
| f"batch_job_id={batch_job.id}, custom_id={custom_id}" | |
| f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}" | |
| ) | |
| failed_count += 1 | |
| failure_count += 1 | |
| continue | |
| for response in batch_responses: | |
| raw_sample_id = response.get("custom_id") | |
| try: | |
| stt_sample_id = int(raw_sample_id) | |
| except (ValueError, TypeError): | |
| logger.warning( | |
| f"[process_completed_stt_batch] Invalid custom_id | " | |
| f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}" | |
| ) | |
| failure_count += 1 | |
| continue |
🤖 Prompt for AI Agents
In `@backend/app/crud/stt_evaluations/cron.py` around lines 394 - 404, In
process_completed_stt_batch's loop over batch_responses replace the direct
indexing response["custom_id"] with response.get("custom_id", None) so a missing
key yields None; keep the existing try/except around int(raw_sample_id) (which
will catch the None as TypeError) and ensure you still log the same warning with
batch_job.id and the raw_sample_id value, increment failure_count and continue
on error; reference variables/functions: batch_responses, response,
raw_sample_id, stt_sample_id, batch_job.id, failure_count, and the surrounding
for loop in cron.py.
| Returns: | ||
| dict: Result summary with batch job info | ||
| """ | ||
| run_id = int(job_id) |
There was a problem hiding this comment.
Unhandled ValueError if job_id is not a valid integer.
int(job_id) will raise ValueError for non-numeric strings. Since this runs in a Celery worker, an unhandled exception will cause the task to fail without updating the run status. Consider wrapping in a try/except or validating before conversion.
Suggested fix
- run_id = int(job_id)
+ try:
+ run_id = int(job_id)
+ except ValueError:
+ logger.error(f"[execute_batch_submission] Invalid job_id: {job_id}")
+ return {"success": False, "error": f"Invalid job_id: {job_id}"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run_id = int(job_id) | |
| try: | |
| run_id = int(job_id) | |
| except ValueError: | |
| logger.error(f"[execute_batch_submission] Invalid job_id: {job_id}") | |
| return {"success": False, "error": f"Invalid job_id: {job_id}"} |
🤖 Prompt for AI Agents
In `@backend/app/services/stt_evaluations/batch_job.py` at line 40, The direct
conversion run_id = int(job_id) can raise ValueError for non-numeric job_id;
wrap the conversion in a try/except ValueError around the run_id = int(job_id)
statement, validate job_id first if you prefer, and in the except block log the
error (including the offending job_id), update the run/task status to a
failed/error state and exit/return the Celery task early so the run record is
updated instead of letting the task crash unhandled.
| stt_result_rows: list[dict] = [] | ||
|
|
||
| for response in batch_responses: | ||
| raw_sample_id = response["custom_id"] |
There was a problem hiding this comment.
lets avoid magic strings like "custom_id"
…I/kaapi-backend into refactor/stt-evaluation
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/crud/stt_evaluations/cron.py (2)
355-360:⚠️ Potential issue | 🟡 Minor
batch_job: Anyshould be typed asBatchJob.
BatchJobis already imported. UsingAnyhere silently discards static-analysis coverage for the entire function body.async def process_completed_stt_batch( session: Session, run: EvaluationRun, - batch_job: Any, + batch_job: BatchJob, batch_provider: GeminiBatchProvider, ) -> None:As per coding guidelines, "Always add type hints to all function parameters and return values in Python code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/stt_evaluations/cron.py` around lines 355 - 360, The function process_completed_stt_batch currently types the batch_job parameter as Any which disables static analysis; change the signature to use the BatchJob type (already imported) for batch_job and update any docstrings or callers if necessary so the parameter is consistently annotated as BatchJob, leaving the return type as -> None; ensure imports remain and run type checks to verify no downstream typing errors in process_completed_stt_batch.
265-277:⚠️ Potential issue | 🟠 MajorPermanent result loss if
process_completed_stt_batchfails after batch reaches SUCCEEDED.Once
poll_batch_statuspersistsSUCCEEDEDtobatch_job.provider_status, a raised exception fromprocess_completed_stt_batch(e.g., transient download failure) propagates unhandled throughpoll_stt_runto the outer catch at line 126, which marks the run"failed". Because subsequent cron executions querystatus == "processing", the run is never retried. The batch result data is permanently lost even though Gemini completed successfully.With the old pre-created PENDING-row approach the rows survived to allow manual or automatic recovery. The new lazy-insert design removed that safety net. The misleading comment on line 268 ("already been processed") reinforces the assumption that SUCCEEDED always implies results were written.
Consider one of:
- Persisting a per-batch-job "results_inserted" flag (or checking result row count) before skipping.
- Catching the exception from
process_completed_stt_batchinpoll_stt_runand leaving the run"processing"with an incremented retry counter so it can be reattempted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/stt_evaluations/cron.py` around lines 265 - 277, The SUCCEEDED terminal state is being treated as "already processed" which causes loss of results if process_completed_stt_batch(...) later fails; modify poll_batch_status / poll_stt_run logic so SUCCEEDED does not unconditionally skip processing: either persist a per-batch flag (e.g., add and check batch_job.results_inserted) or verify result rows exist before skipping, and/or catch exceptions from process_completed_stt_batch(batch_job, ...) inside poll_stt_run to leave the run status as "processing" and increment a retry counter instead of letting the outer handler mark the whole run failed; use the identifiers BatchJobState, TERMINAL_STATES, process_completed_stt_batch, poll_batch_status, and poll_stt_run when making changes.
♻️ Duplicate comments (1)
backend/app/crud/stt_evaluations/cron.py (1)
395-405:response[BATCH_KEY]still raisesKeyErrorif the key is absent.Using
BATCH_KEYas a constant addresses the magic-string concern, but the direct subscript access on line 396 still raises an unhandledKeyErrorfor any response that omits the field, aborting processing for the entire batch. The fix from the prior review still applies — useresponse.get(BATCH_KEY)so a missing key falls through asNoneand is caught by the existingTypeErrorguard.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/stt_evaluations/cron.py` around lines 395 - 405, The loop over batch_responses uses direct subscript access response[BATCH_KEY] which can raise KeyError if the key is missing; change it to use response.get(BATCH_KEY) (assigning to raw_sample_id) so missing keys yield None and are then handled by the existing except (ValueError, TypeError) block; update the reference in the block that logs invalid values (the f-string using {BATCH_KEY} and raw_sample_id) but no other control flow changes are needed in process_completed_stt_batch.
🧹 Nitpick comments (3)
backend/app/crud/evaluations/processing.py (1)
28-28: Prefer importingBATCH_KEYfrom the package rather than directly from the submodule.
BATCH_KEYis re-exported viaapp.core.batch.__init__.__all__, and all other batch symbols in this file (line 22–27) are already imported fromapp.core.batch. Importing directly from.basebypasses the intended public API surface and creates an inconsistent import style within the same file.♻️ Suggested consolidation
from app.core.batch import ( + BATCH_KEY, OpenAIBatchProvider, download_batch_results, poll_batch_status, upload_batch_results_to_object_store, ) -from app.core.batch.base import BATCH_KEY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/processing.py` at line 28, The file imports BATCH_KEY directly from app.core.batch.base which bypasses the package public API and is inconsistent with the other batch imports; change the import to pull BATCH_KEY from the package export (use from app.core.batch import BATCH_KEY) so it aligns with the existing imports of other batch symbols and uses the package's public surface (update the import that currently references app.core.batch.base to the package-level import).backend/app/crud/evaluations/batch.py (1)
17-19: ConsolidateBATCH_KEYimport with the existingapp.core.batchimport block.
BATCH_KEYis available from the package level (app.core.batch.__all__), so it can be imported alongsideOpenAIBatchProviderandstart_batch_jobinstead of importing directly from the submodule. The current split also introduces an unnecessary blank line between adjacent third-party-to-local imports.♻️ Suggested consolidation
-from app.core.batch.base import BATCH_KEY - -from app.core.batch import OpenAIBatchProvider, start_batch_job +from app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/batch.py` around lines 17 - 19, Consolidate the BATCH_KEY import into the existing package import by removing the separate "from app.core.batch.base import BATCH_KEY" and adding BATCH_KEY to the existing "from app.core.batch import OpenAIBatchProvider, start_batch_job" so it becomes "from app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job", and remove the extra blank line between adjacent imports to keep imports grouped cleanly.backend/app/crud/evaluations/embeddings.py (1)
19-19: Same import inconsistency asprocessing.pyandbatch.py—BATCH_KEYis re-exported fromapp.core.batchbut imported directly from the submodule here. Consolidating tofrom app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_jobkeeps all batch imports at the package level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/embeddings.py` at line 19, Replace the submodule import of BATCH_KEY in embeddings.py with the package-level batch export to match other files: import BATCH_KEY from app.core.batch (not app.core.batch.base) and, if this module also uses OpenAIBatchProvider or start_batch_job elsewhere in the file, consolidate those to a single "from app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job" so all batch-related symbols are imported from the package export (look for the BATCH_KEY usage in embeddings.py to confirm).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/crud/evaluations/embeddings.py`:
- Line 162: The log call in parse_embedding_results currently uses
logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping") and violates
the guideline requiring a function-name prefix; update that call to include the
prefix "[parse_embedding_results]" at the start of the message so it becomes
logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY}
found, skipping"), preserving the existing variables (line_num, BATCH_KEY) and
message text.
---
Outside diff comments:
In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 355-360: The function process_completed_stt_batch currently types
the batch_job parameter as Any which disables static analysis; change the
signature to use the BatchJob type (already imported) for batch_job and update
any docstrings or callers if necessary so the parameter is consistently
annotated as BatchJob, leaving the return type as -> None; ensure imports remain
and run type checks to verify no downstream typing errors in
process_completed_stt_batch.
- Around line 265-277: The SUCCEEDED terminal state is being treated as "already
processed" which causes loss of results if process_completed_stt_batch(...)
later fails; modify poll_batch_status / poll_stt_run logic so SUCCEEDED does not
unconditionally skip processing: either persist a per-batch flag (e.g., add and
check batch_job.results_inserted) or verify result rows exist before skipping,
and/or catch exceptions from process_completed_stt_batch(batch_job, ...) inside
poll_stt_run to leave the run status as "processing" and increment a retry
counter instead of letting the outer handler mark the whole run failed; use the
identifiers BatchJobState, TERMINAL_STATES, process_completed_stt_batch,
poll_batch_status, and poll_stt_run when making changes.
---
Duplicate comments:
In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 395-405: The loop over batch_responses uses direct subscript
access response[BATCH_KEY] which can raise KeyError if the key is missing;
change it to use response.get(BATCH_KEY) (assigning to raw_sample_id) so missing
keys yield None and are then handled by the existing except (ValueError,
TypeError) block; update the reference in the block that logs invalid values
(the f-string using {BATCH_KEY} and raw_sample_id) but no other control flow
changes are needed in process_completed_stt_batch.
---
Nitpick comments:
In `@backend/app/crud/evaluations/batch.py`:
- Around line 17-19: Consolidate the BATCH_KEY import into the existing package
import by removing the separate "from app.core.batch.base import BATCH_KEY" and
adding BATCH_KEY to the existing "from app.core.batch import
OpenAIBatchProvider, start_batch_job" so it becomes "from app.core.batch import
BATCH_KEY, OpenAIBatchProvider, start_batch_job", and remove the extra blank
line between adjacent imports to keep imports grouped cleanly.
In `@backend/app/crud/evaluations/embeddings.py`:
- Line 19: Replace the submodule import of BATCH_KEY in embeddings.py with the
package-level batch export to match other files: import BATCH_KEY from
app.core.batch (not app.core.batch.base) and, if this module also uses
OpenAIBatchProvider or start_batch_job elsewhere in the file, consolidate those
to a single "from app.core.batch import BATCH_KEY, OpenAIBatchProvider,
start_batch_job" so all batch-related symbols are imported from the package
export (look for the BATCH_KEY usage in embeddings.py to confirm).
In `@backend/app/crud/evaluations/processing.py`:
- Line 28: The file imports BATCH_KEY directly from app.core.batch.base which
bypasses the package public API and is inconsistent with the other batch
imports; change the import to pull BATCH_KEY from the package export (use from
app.core.batch import BATCH_KEY) so it aligns with the existing imports of other
batch symbols and uses the package's public surface (update the import that
currently references app.core.batch.base to the package-level import).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/core/batch/__init__.pybackend/app/core/batch/base.pybackend/app/core/batch/gemini.pybackend/app/core/batch/openai.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/crud/stt_evaluations/cron.py
| trace_id = response.get(BATCH_KEY) | ||
| if not trace_id: | ||
| logger.warning(f"Line {line_num}: No custom_id found, skipping") | ||
| logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping") |
There was a problem hiding this comment.
Missing [function_name] log prefix — violates coding guideline.
The changed log message at line 162 is missing the required [parse_embedding_results] prefix.
🛠️ Proposed fix
- logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping")
+ logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY} found, skipping")As per coding guidelines: "Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message ...")".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping") | |
| logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY} found, skipping") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/evaluations/embeddings.py` at line 162, The log call in
parse_embedding_results currently uses logger.warning(f"Line {line_num}: No
{BATCH_KEY} found, skipping") and violates the guideline requiring a
function-name prefix; update that call to include the prefix
"[parse_embedding_results]" at the start of the message so it becomes
logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY}
found, skipping"), preserving the existing variables (line_num, BATCH_KEY) and
message text.
Summary
Target issue is #620
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Refactor
Performance
Reliability
Bug Fixes
individually marked each pending result as FAILED. This is gone since results don't exist yet at that stage.
final_status = "completed" when all batch jobs are done (since there can't be orphaned pending results anymore)
Summary by CodeRabbit
New Features
Bug Fixes
Refactor