Skip to content

STT Evaluation: Refactor#601

Merged
AkhileshNegi merged 11 commits intomainfrom
refactor/stt-evaluation
Feb 24, 2026
Merged

STT Evaluation: Refactor#601
AkhileshNegi merged 11 commits intomainfrom
refactor/stt-evaluation

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Feb 12, 2026

Summary

Target issue is #620

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • Refactor

    • STT evaluation submission now queues work as async background jobs for more scalable, responsive processing
    • Result processing optimized with bulk inserts for faster completion
  • Performance

    • Concurrent generation of signed audio URLs to speed batch preparation
  • Reliability

    • Improved logging and error handling for clearer run status and failure messages
  • Bug Fixes

  • Removed create_stt_results — no more pre-creating result rows with status=PENDING at batch submission.
  • Removed get_pending_results_for_run — no longer needed since there are no pending result rows to query/fail on errors.
  • Removed inline error-to-pending-result propagation — the old code caught exceptions during URL signing or batch submission and
    individually marked each pending result as FAILED. This is gone since results don't exist yet at that stage.
  • Simplified run finalization — poll_stt_run no longer checks for remaining pending results to decide the final status. It just sets
    final_status = "completed" when all batch jobs are done (since there can't be orphaned pending results anymore)

Summary by CodeRabbit

  • New Features

    • Transitioned STT evaluations to asynchronous background job processing for faster response times.
    • Added concurrent audio file URL generation to improve batch processing performance.
  • Bug Fixes

    • Improved error handling and status tracking for failed evaluations.
  • Refactor

    • Optimized batch result handling with efficient bulk operations instead of per-item updates.
    • Standardized batch processing identifiers across providers.

@AkhileshNegi AkhileshNegi self-assigned this Feb 12, 2026
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Feb 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Refactors STT evaluation to enqueue batch submission as a Celery low-priority job (traceable via correlation_id) instead of synchronous submission; removes per-sample result CRUD and pending-run helpers; adds a Celery-executable execute_batch_submission service; shifts to concurrent signed-URL generation and bulk-insert result persistence.

Changes

Cohort / File(s) Summary
API Route & Tests
backend/app/api/routes/stt_evaluations/evaluation.py, backend/app/tests/api/routes/test_stt_evaluation.py
Replaced synchronous batch submission with start_low_priority_job enqueue; use correlation_id for trace_id; removed in-route per-sample result creation and DB refresh; tests updated to expect task ID and strict "pending" status.
Result Feedback API
backend/app/api/routes/stt_evaluations/result.py
Removed pre-check that fetched an existing result before calling update_human_feedback; now delegates directly to updater.
CRUD surface removals
backend/app/crud/stt_evaluations/__init__.py, backend/app/crud/stt_evaluations/result.py, backend/app/crud/stt_evaluations/run.py
Removed exports/implementations: create_stt_results, update_stt_result, get_pending_results_for_run, and get_pending_stt_runs, reducing per-sample CRUD primitives.
Batch signing & submission
backend/app/crud/stt_evaluations/batch.py
Added concurrent signed-URL generation via ThreadPoolExecutor with per-task error capture; aggregate success/failure handling; removed in-loop commits and legacy per-result failure updates; raise only if all signings fail.
Cron / Polling & Result Processing
backend/app/crud/stt_evaluations/cron.py
Simplified Gemini client creation per-project; standardized logging; when batches complete, construct in-memory result rows and perform chunked bulk inserts (bulk_insert_mappings), with updated success/failure counters and messages.
Dataset queries
backend/app/crud/stt_evaluations/dataset.py
Removed STTSamplePublic import; made limit parameter required for get_samples_by_dataset_id (removed default).
Services: Batch Job, Audio, Dataset
backend/app/services/stt_evaluations/batch_job.py, backend/app/services/stt_evaluations/audio.py, backend/app/services/stt_evaluations/dataset.py
Added execute_batch_submission Celery-target function (loads run, fetches samples, calls batch submission inside a Session); audio upload now derives size from file metadata; logging reformatted to explicit key-value style.
Batch key standardization
backend/app/core/batch/..., backend/app/crud/evaluations/..., backend/app/crud/evaluations/embeddings.py, backend/app/crud/evaluations/processing.py
Introduced BATCH_KEY constant and replaced literal custom_id usages across batch providers and evaluation/embedding processing to standardize item identifier key.
Public exports & small docs/logging
backend/app/core/batch/__init__.py, backend/app/core/batch/base.py, backend/app/core/batch/gemini.py, backend/app/core/batch/openai.py
Re-exported BATCH_KEY, added constant BATCH_KEY = "custom_id", and updated provider docstrings/returns to use BATCH_KEY.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API Route
    participant Queue as Celery Queue
    participant Worker as Batch Worker
    participant DB as Database
    participant Service as Batch Service
    participant Provider as External Provider

    Client->>API: POST start STT evaluation
    API->>DB: create STT run (status: pending)
    DB-->>API: run created
    API->>Queue: enqueue start_low_priority_job(job_id, run_id, dataset_id, org_id, trace_id)
    Queue-->>API: returns task_id
    API-->>Client: 202 Accepted (run_id, status: pending, task_id)

    Queue->>Worker: deliver task (execute_batch_submission)
    Worker->>DB: fetch run by id/org/project
    Worker->>DB: fetch samples (limit = run.total_items)
    Worker->>Service: start_stt_evaluation_batch(run, samples)
    Service->>Service: generate signed URLs concurrently (ThreadPoolExecutor)
    Service->>Provider: submit batch job
    Provider-->>Service: batch response (keys/status)
    Service->>DB: bulk_insert STT results (chunked)
    Service->>DB: update run status (submitted/processing/failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Evaluation: STT #571 — Related: earlier STT evaluation CRUD and batch submission flow that this PR refactors/replaces.
  • Evaluation #405 — Related: introduces batch job primitives and Celery batch execution infrastructure used by the new async workflow.
  • Evaluation: Refactor cron #594 — Related: polling/cron refactors that overlap with updated polling and failure handling in this PR.

Suggested reviewers

  • kartpop
  • Prajna1999
  • vprashrex

Poem

🐇 Hopping queues where tasks take flight,

Celery hums through dusk and light,
Signed URLs dash, concurrent and spry,
Bulk rows tumble softly, oh my!
A rabbit applauds the async sky.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'STT Evaluation: Refactor' is vague and generic, using a non-descriptive term ('Refactor') that does not convey the specific, meaningful changes involved in the PR. Use a more specific title that highlights the primary change, such as 'STT Evaluation: Convert to async background jobs with bulk result processing' or 'STT Evaluation: Refactor submission to Celery-based async workflow'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/stt-evaluation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 30.55556% with 50 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/stt_evaluations/cron.py 9.37% 29 Missing ⚠️
backend/app/crud/stt_evaluations/batch.py 5.00% 19 Missing ⚠️
...ckend/app/api/routes/stt_evaluations/evaluation.py 80.00% 1 Missing ⚠️
backend/app/crud/evaluations/embeddings.py 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@AkhileshNegi AkhileshNegi marked this pull request as ready for review February 12, 2026 16:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/crud/stt_evaluations/cron.py (2)

118-118: ⚠️ Potential issue | 🟡 Minor

Dead code path: "processed" action is never returned by poll_stt_run.

poll_stt_run returns action values "completed", "failed", or "no_change" — never "processed". The "processed" check here is unreachable, which means the intent might be wrong or this is leftover from a prior version.

Suggested fix
-                    if result["action"] in ("completed", "processed"):
+                    if result["action"] == "completed":

354-359: 🛠️ Refactor suggestion | 🟠 Major

batch_job parameter typed as Any — should be BatchJob.

The coding guidelines require type hints on all function parameters. batch_job is used as a BatchJob throughout (accessing .id, .config, .provider_batch_id), so the type annotation should reflect that.

Suggested fix
 async def process_completed_stt_batch(
     session: Session,
     run: EvaluationRun,
-    batch_job: Any,
+    batch_job: BatchJob,
     batch_provider: GeminiBatchProvider,
 ) -> None:

As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".

🤖 Fix all issues with AI agents
In `@backend/app/api/routes/stt_evaluations/evaluation.py`:
- Around line 108-111: The HTTPException currently returns internal exception
text via detail=f"Failed to queue batch submission: {e}"; change this to a
generic message (e.g. detail="Failed to queue batch submission") so internal
error details are not leaked to clients, keep the status_code=500, and ensure
the original exception is still logged (the existing logger call earlier in the
surrounding function in evaluation.py should remain unchanged); locate the raise
HTTPException(...) in the batch submission handler in evaluation.py and remove
the interpolated exception from the detail string.

In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 394-404: In process_completed_stt_batch's loop over
batch_responses replace the direct indexing response["custom_id"] with
response.get("custom_id", None) so a missing key yields None; keep the existing
try/except around int(raw_sample_id) (which will catch the None as TypeError)
and ensure you still log the same warning with batch_job.id and the
raw_sample_id value, increment failure_count and continue on error; reference
variables/functions: batch_responses, response, raw_sample_id, stt_sample_id,
batch_job.id, failure_count, and the surrounding for loop in cron.py.

In `@backend/app/services/stt_evaluations/batch_job.py`:
- Line 40: The direct conversion run_id = int(job_id) can raise ValueError for
non-numeric job_id; wrap the conversion in a try/except ValueError around the
run_id = int(job_id) statement, validate job_id first if you prefer, and in the
except block log the error (including the offending job_id), update the run/task
status to a failed/error state and exit/return the Celery task early so the run
record is updated instead of letting the task crash unhandled.
🧹 Nitpick comments (6)
backend/app/crud/stt_evaluations/dataset.py (1)

298-299: Misleading comment: limit is still applied, only the default was removed.

The comment says "removing limit" but the query still applies .limit(limit) on line 324. Consider rewording to clarify that the default was removed to force callers to be explicit about how many samples they want.

Suggested wording
-    # removing limit for now since we need all samples for batch job, can add pagination later if needed
-    limit: int,
+    limit: int,  # no default; callers must specify explicitly
backend/app/crud/stt_evaluations/batch.py (1)

107-107: Consider making max_workers configurable.

The hardcoded max_workers=10 is reasonable for typical workloads, but for large datasets with hundreds of samples, tuning may be needed. A constant or parameter would make this easier to adjust.

backend/app/services/stt_evaluations/batch_job.py (1)

15-23: Add type hint for task_instance and return type.

task_instance lacks a type annotation. Per coding guidelines, all function parameters and return values should have type hints. Also, the return type dict could be narrowed.

Suggested fix
+from typing import Any
+from celery import Task
+
 def execute_batch_submission(
     project_id: int,
     job_id: str,
     task_id: str,
-    task_instance,
+    task_instance: Task,
     organization_id: int,
     dataset_id: int,
     **kwargs,
-) -> dict:
+) -> dict[str, Any]:

As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".

backend/app/tests/api/routes/test_stt_evaluation.py (1)

545-585: Test correctly updated for Celery offload flow.

The patch target, mock return value, and assertions align well with the route changes.

Consider using assert_called_once_with(...) to verify the correct function_path, job_id, organization_id, and dataset_id are passed to start_low_priority_job, which would catch wiring bugs.

backend/app/crud/stt_evaluations/cron.py (2)

371-374: Log format inconsistency: mixed = and : separators.

Lines 372–373 and 443–446 use key=value format (e.g., run_id={run.id}), while other functions in this file use key: {value} format (e.g., Line 128 run_id: {run.id}). Consider unifying to one style for consistent structured log parsing.

Also applies to: 443-446


427-433: Use modern SQLAlchemy 2.0 session.execute(insert(...)) instead of legacy bulk_insert_mappings().

Session.bulk_insert_mappings() is a legacy feature deprecated in SQLAlchemy 2.0 in favor of session.execute(insert(Model), list_of_dicts), which also provides better performance through insertmanyvalues batching. Additionally, if an exception occurs after chunks are flushed but before session.commit(), the partially flushed data remains in the session—the exception handler currently re-raises without explicit rollback, delegating recovery to the caller.

Replace with:

from sqlalchemy import insert

# Bulk insert in batches of 200
insert_batch_size = 200
for i in range(0, len(stt_result_rows), insert_batch_size):
    chunk = stt_result_rows[i : i + insert_batch_size]
    session.execute(insert(STTResult), chunk)
if stt_result_rows:
    session.commit()

Comment on lines +108 to 111
raise HTTPException(
status_code=500,
detail=f"Failed to queue batch submission: {e}",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid leaking internal exception details in the HTTP response.

detail=f"Failed to queue batch submission: {e}" exposes internal error information to the API client. Use a generic message instead; the detailed error is already logged on line 99.

Suggested fix
         raise HTTPException(
             status_code=500,
-            detail=f"Failed to queue batch submission: {e}",
+            detail="Failed to start evaluation. Please try again later.",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise HTTPException(
status_code=500,
detail=f"Failed to queue batch submission: {e}",
)
raise HTTPException(
status_code=500,
detail="Failed to start evaluation. Please try again later.",
)
🤖 Prompt for AI Agents
In `@backend/app/api/routes/stt_evaluations/evaluation.py` around lines 108 - 111,
The HTTPException currently returns internal exception text via detail=f"Failed
to queue batch submission: {e}"; change this to a generic message (e.g.
detail="Failed to queue batch submission") so internal error details are not
leaked to clients, keep the status_code=500, and ensure the original exception
is still logged (the existing logger call earlier in the surrounding function in
evaluation.py should remain unchanged); locate the raise HTTPException(...) in
the batch submission handler in evaluation.py and remove the interpolated
exception from the detail string.

Comment on lines 394 to 404
for response in batch_responses:
raw_sample_id = response["custom_id"]
try:
sample_id = int(custom_id)
stt_sample_id = int(raw_sample_id)
except (ValueError, TypeError):
logger.warning(
f"[process_completed_stt_batch] Invalid custom_id | "
f"batch_job_id={batch_job.id}, custom_id={custom_id}"
f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}"
)
failed_count += 1
failure_count += 1
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

response["custom_id"] will raise KeyError if the key is missing.

If a batch response lacks "custom_id", this line throws an unhandled KeyError that propagates up and aborts processing of the entire batch. Use .get() with a fallback and handle the missing-key case alongside the existing ValueError/TypeError guard.

Suggested fix
         for response in batch_responses:
-            raw_sample_id = response["custom_id"]
+            raw_sample_id = response.get("custom_id")
             try:
                 stt_sample_id = int(raw_sample_id)
             except (ValueError, TypeError):
                 logger.warning(
                     f"[process_completed_stt_batch] Invalid custom_id | "
                     f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}"
                 )
                 failure_count += 1
                 continue

With this change, a None value from .get() will be caught by the TypeError handler below, logging a warning and continuing gracefully.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for response in batch_responses:
raw_sample_id = response["custom_id"]
try:
sample_id = int(custom_id)
stt_sample_id = int(raw_sample_id)
except (ValueError, TypeError):
logger.warning(
f"[process_completed_stt_batch] Invalid custom_id | "
f"batch_job_id={batch_job.id}, custom_id={custom_id}"
f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}"
)
failed_count += 1
failure_count += 1
continue
for response in batch_responses:
raw_sample_id = response.get("custom_id")
try:
stt_sample_id = int(raw_sample_id)
except (ValueError, TypeError):
logger.warning(
f"[process_completed_stt_batch] Invalid custom_id | "
f"batch_job_id={batch_job.id}, custom_id={raw_sample_id}"
)
failure_count += 1
continue
🤖 Prompt for AI Agents
In `@backend/app/crud/stt_evaluations/cron.py` around lines 394 - 404, In
process_completed_stt_batch's loop over batch_responses replace the direct
indexing response["custom_id"] with response.get("custom_id", None) so a missing
key yields None; keep the existing try/except around int(raw_sample_id) (which
will catch the None as TypeError) and ensure you still log the same warning with
batch_job.id and the raw_sample_id value, increment failure_count and continue
on error; reference variables/functions: batch_responses, response,
raw_sample_id, stt_sample_id, batch_job.id, failure_count, and the surrounding
for loop in cron.py.

Returns:
dict: Result summary with batch job info
"""
run_id = int(job_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unhandled ValueError if job_id is not a valid integer.

int(job_id) will raise ValueError for non-numeric strings. Since this runs in a Celery worker, an unhandled exception will cause the task to fail without updating the run status. Consider wrapping in a try/except or validating before conversion.

Suggested fix
-    run_id = int(job_id)
+    try:
+        run_id = int(job_id)
+    except ValueError:
+        logger.error(f"[execute_batch_submission] Invalid job_id: {job_id}")
+        return {"success": False, "error": f"Invalid job_id: {job_id}"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run_id = int(job_id)
try:
run_id = int(job_id)
except ValueError:
logger.error(f"[execute_batch_submission] Invalid job_id: {job_id}")
return {"success": False, "error": f"Invalid job_id: {job_id}"}
🤖 Prompt for AI Agents
In `@backend/app/services/stt_evaluations/batch_job.py` at line 40, The direct
conversion run_id = int(job_id) can raise ValueError for non-numeric job_id;
wrap the conversion in a try/except ValueError around the run_id = int(job_id)
statement, validate job_id first if you prefer, and in the except block log the
error (including the offending job_id), update the run/task status to a
failed/error state and exit/return the Celery task early so the run record is
updated instead of letting the task crash unhandled.

@vprashrex vprashrex self-requested a review February 16, 2026 03:56
@AkhileshNegi AkhileshNegi requested a review from kartpop February 23, 2026 17:06
Copy link
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments

stt_result_rows: list[dict] = []

for response in batch_responses:
raw_sample_id = response["custom_id"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets avoid magic strings like "custom_id"

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/crud/stt_evaluations/cron.py (2)

355-360: ⚠️ Potential issue | 🟡 Minor

batch_job: Any should be typed as BatchJob.

BatchJob is already imported. Using Any here silently discards static-analysis coverage for the entire function body.

 async def process_completed_stt_batch(
     session: Session,
     run: EvaluationRun,
-    batch_job: Any,
+    batch_job: BatchJob,
     batch_provider: GeminiBatchProvider,
 ) -> None:

As per coding guidelines, "Always add type hints to all function parameters and return values in Python code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/cron.py` around lines 355 - 360, The
function process_completed_stt_batch currently types the batch_job parameter as
Any which disables static analysis; change the signature to use the BatchJob
type (already imported) for batch_job and update any docstrings or callers if
necessary so the parameter is consistently annotated as BatchJob, leaving the
return type as -> None; ensure imports remain and run type checks to verify no
downstream typing errors in process_completed_stt_batch.

265-277: ⚠️ Potential issue | 🟠 Major

Permanent result loss if process_completed_stt_batch fails after batch reaches SUCCEEDED.

Once poll_batch_status persists SUCCEEDED to batch_job.provider_status, a raised exception from process_completed_stt_batch (e.g., transient download failure) propagates unhandled through poll_stt_run to the outer catch at line 126, which marks the run "failed". Because subsequent cron executions query status == "processing", the run is never retried. The batch result data is permanently lost even though Gemini completed successfully.

With the old pre-created PENDING-row approach the rows survived to allow manual or automatic recovery. The new lazy-insert design removed that safety net. The misleading comment on line 268 ("already been processed") reinforces the assumption that SUCCEEDED always implies results were written.

Consider one of:

  • Persisting a per-batch-job "results_inserted" flag (or checking result row count) before skipping.
  • Catching the exception from process_completed_stt_batch in poll_stt_run and leaving the run "processing" with an incremented retry counter so it can be reattempted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/cron.py` around lines 265 - 277, The
SUCCEEDED terminal state is being treated as "already processed" which causes
loss of results if process_completed_stt_batch(...) later fails; modify
poll_batch_status / poll_stt_run logic so SUCCEEDED does not unconditionally
skip processing: either persist a per-batch flag (e.g., add and check
batch_job.results_inserted) or verify result rows exist before skipping, and/or
catch exceptions from process_completed_stt_batch(batch_job, ...) inside
poll_stt_run to leave the run status as "processing" and increment a retry
counter instead of letting the outer handler mark the whole run failed; use the
identifiers BatchJobState, TERMINAL_STATES, process_completed_stt_batch,
poll_batch_status, and poll_stt_run when making changes.
♻️ Duplicate comments (1)
backend/app/crud/stt_evaluations/cron.py (1)

395-405: response[BATCH_KEY] still raises KeyError if the key is absent.

Using BATCH_KEY as a constant addresses the magic-string concern, but the direct subscript access on line 396 still raises an unhandled KeyError for any response that omits the field, aborting processing for the entire batch. The fix from the prior review still applies — use response.get(BATCH_KEY) so a missing key falls through as None and is caught by the existing TypeError guard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/cron.py` around lines 395 - 405, The loop
over batch_responses uses direct subscript access response[BATCH_KEY] which can
raise KeyError if the key is missing; change it to use response.get(BATCH_KEY)
(assigning to raw_sample_id) so missing keys yield None and are then handled by
the existing except (ValueError, TypeError) block; update the reference in the
block that logs invalid values (the f-string using {BATCH_KEY} and
raw_sample_id) but no other control flow changes are needed in
process_completed_stt_batch.
🧹 Nitpick comments (3)
backend/app/crud/evaluations/processing.py (1)

28-28: Prefer importing BATCH_KEY from the package rather than directly from the submodule.

BATCH_KEY is re-exported via app.core.batch.__init__.__all__, and all other batch symbols in this file (line 22–27) are already imported from app.core.batch. Importing directly from .base bypasses the intended public API surface and creates an inconsistent import style within the same file.

♻️ Suggested consolidation
 from app.core.batch import (
+    BATCH_KEY,
     OpenAIBatchProvider,
     download_batch_results,
     poll_batch_status,
     upload_batch_results_to_object_store,
 )
-from app.core.batch.base import BATCH_KEY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/processing.py` at line 28, The file imports
BATCH_KEY directly from app.core.batch.base which bypasses the package public
API and is inconsistent with the other batch imports; change the import to pull
BATCH_KEY from the package export (use from app.core.batch import BATCH_KEY) so
it aligns with the existing imports of other batch symbols and uses the
package's public surface (update the import that currently references
app.core.batch.base to the package-level import).
backend/app/crud/evaluations/batch.py (1)

17-19: Consolidate BATCH_KEY import with the existing app.core.batch import block.

BATCH_KEY is available from the package level (app.core.batch.__all__), so it can be imported alongside OpenAIBatchProvider and start_batch_job instead of importing directly from the submodule. The current split also introduces an unnecessary blank line between adjacent third-party-to-local imports.

♻️ Suggested consolidation
-from app.core.batch.base import BATCH_KEY
-
-from app.core.batch import OpenAIBatchProvider, start_batch_job
+from app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/batch.py` around lines 17 - 19, Consolidate the
BATCH_KEY import into the existing package import by removing the separate "from
app.core.batch.base import BATCH_KEY" and adding BATCH_KEY to the existing "from
app.core.batch import OpenAIBatchProvider, start_batch_job" so it becomes "from
app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job", and
remove the extra blank line between adjacent imports to keep imports grouped
cleanly.
backend/app/crud/evaluations/embeddings.py (1)

19-19: Same import inconsistency as processing.py and batch.pyBATCH_KEY is re-exported from app.core.batch but imported directly from the submodule here. Consolidating to from app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job keeps all batch imports at the package level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/embeddings.py` at line 19, Replace the submodule
import of BATCH_KEY in embeddings.py with the package-level batch export to
match other files: import BATCH_KEY from app.core.batch (not
app.core.batch.base) and, if this module also uses OpenAIBatchProvider or
start_batch_job elsewhere in the file, consolidate those to a single "from
app.core.batch import BATCH_KEY, OpenAIBatchProvider, start_batch_job" so all
batch-related symbols are imported from the package export (look for the
BATCH_KEY usage in embeddings.py to confirm).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/crud/evaluations/embeddings.py`:
- Line 162: The log call in parse_embedding_results currently uses
logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping") and violates
the guideline requiring a function-name prefix; update that call to include the
prefix "[parse_embedding_results]" at the start of the message so it becomes
logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY}
found, skipping"), preserving the existing variables (line_num, BATCH_KEY) and
message text.

---

Outside diff comments:
In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 355-360: The function process_completed_stt_batch currently types
the batch_job parameter as Any which disables static analysis; change the
signature to use the BatchJob type (already imported) for batch_job and update
any docstrings or callers if necessary so the parameter is consistently
annotated as BatchJob, leaving the return type as -> None; ensure imports remain
and run type checks to verify no downstream typing errors in
process_completed_stt_batch.
- Around line 265-277: The SUCCEEDED terminal state is being treated as "already
processed" which causes loss of results if process_completed_stt_batch(...)
later fails; modify poll_batch_status / poll_stt_run logic so SUCCEEDED does not
unconditionally skip processing: either persist a per-batch flag (e.g., add and
check batch_job.results_inserted) or verify result rows exist before skipping,
and/or catch exceptions from process_completed_stt_batch(batch_job, ...) inside
poll_stt_run to leave the run status as "processing" and increment a retry
counter instead of letting the outer handler mark the whole run failed; use the
identifiers BatchJobState, TERMINAL_STATES, process_completed_stt_batch,
poll_batch_status, and poll_stt_run when making changes.

---

Duplicate comments:
In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 395-405: The loop over batch_responses uses direct subscript
access response[BATCH_KEY] which can raise KeyError if the key is missing;
change it to use response.get(BATCH_KEY) (assigning to raw_sample_id) so missing
keys yield None and are then handled by the existing except (ValueError,
TypeError) block; update the reference in the block that logs invalid values
(the f-string using {BATCH_KEY} and raw_sample_id) but no other control flow
changes are needed in process_completed_stt_batch.

---

Nitpick comments:
In `@backend/app/crud/evaluations/batch.py`:
- Around line 17-19: Consolidate the BATCH_KEY import into the existing package
import by removing the separate "from app.core.batch.base import BATCH_KEY" and
adding BATCH_KEY to the existing "from app.core.batch import
OpenAIBatchProvider, start_batch_job" so it becomes "from app.core.batch import
BATCH_KEY, OpenAIBatchProvider, start_batch_job", and remove the extra blank
line between adjacent imports to keep imports grouped cleanly.

In `@backend/app/crud/evaluations/embeddings.py`:
- Line 19: Replace the submodule import of BATCH_KEY in embeddings.py with the
package-level batch export to match other files: import BATCH_KEY from
app.core.batch (not app.core.batch.base) and, if this module also uses
OpenAIBatchProvider or start_batch_job elsewhere in the file, consolidate those
to a single "from app.core.batch import BATCH_KEY, OpenAIBatchProvider,
start_batch_job" so all batch-related symbols are imported from the package
export (look for the BATCH_KEY usage in embeddings.py to confirm).

In `@backend/app/crud/evaluations/processing.py`:
- Line 28: The file imports BATCH_KEY directly from app.core.batch.base which
bypasses the package public API and is inconsistent with the other batch
imports; change the import to pull BATCH_KEY from the package export (use from
app.core.batch import BATCH_KEY) so it aligns with the existing imports of other
batch symbols and uses the package's public surface (update the import that
currently references app.core.batch.base to the package-level import).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a035718 and e5b44ed.

📒 Files selected for processing (8)
  • backend/app/core/batch/__init__.py
  • backend/app/core/batch/base.py
  • backend/app/core/batch/gemini.py
  • backend/app/core/batch/openai.py
  • backend/app/crud/evaluations/batch.py
  • backend/app/crud/evaluations/embeddings.py
  • backend/app/crud/evaluations/processing.py
  • backend/app/crud/stt_evaluations/cron.py

trace_id = response.get(BATCH_KEY)
if not trace_id:
logger.warning(f"Line {line_num}: No custom_id found, skipping")
logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing [function_name] log prefix — violates coding guideline.

The changed log message at line 162 is missing the required [parse_embedding_results] prefix.

🛠️ Proposed fix
-                logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping")
+                logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY} found, skipping")

As per coding guidelines: "Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message ...")".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.warning(f"Line {line_num}: No {BATCH_KEY} found, skipping")
logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY} found, skipping")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/embeddings.py` at line 162, The log call in
parse_embedding_results currently uses logger.warning(f"Line {line_num}: No
{BATCH_KEY} found, skipping") and violates the guideline requiring a
function-name prefix; update that call to include the prefix
"[parse_embedding_results]" at the start of the message so it becomes
logger.warning(f"[parse_embedding_results] Line {line_num}: No {BATCH_KEY}
found, skipping"), preserving the existing variables (line_num, BATCH_KEY) and
message text.

@AkhileshNegi AkhileshNegi merged commit 25f6738 into main Feb 24, 2026
2 of 3 checks passed
@AkhileshNegi AkhileshNegi deleted the refactor/stt-evaluation branch February 24, 2026 04:49
@AkhileshNegi AkhileshNegi linked an issue Feb 24, 2026 that may be closed by this pull request
@coderabbitai coderabbitai bot mentioned this pull request Feb 24, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

STT Evaluation: Refactor

3 participants