Skip to content

Comments

TTS/STT: Gemini-TTS Model Integration with unified API#574

Merged
Prajna1999 merged 65 commits intomainfrom
feature/unified-api-tts
Feb 23, 2026
Merged

TTS/STT: Gemini-TTS Model Integration with unified API#574
Prajna1999 merged 65 commits intomainfrom
feature/unified-api-tts

Conversation

@Prajna1999
Copy link
Collaborator

@Prajna1999 Prajna1999 commented Feb 2, 2026

Summary

Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  1. Currently tested with only Hindi and English as the output language.
  2. Ad hoc main function to test tts scripts will be deleted before final push.

Summary by CodeRabbit

  • New Features

    • Added Text-to-Speech support (Google Gemini) with configurable model, voice, language, and output formats (WAV/MP3/OGG).
    • Audio conversion options (MP3, OGG) with format selection and MIME-aware outputs.
  • Improvements

    • Better audio input validation and accurate audio size calculations.
    • Consolidated input resolution and temporary-audio handling for text and audio inputs.
  • Chores

    • Added pydub dependency for audio processing.
  • Tests

    • Extensive STT and TTS test coverage added.

Prajna1999 and others added 28 commits January 20, 2026 21:57
…-stt-new' into feature/unified-api-stt-new

merge upstream
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/app/services/llm/providers/gai.py (1)

283-286: f-string without placeholders on line 285.

Static analysis (Ruff F541) flags this. Remove the f prefix.

Proposed fix
-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 283 - 286, The
logger.warning call inside GoogleAIProvider._execute_tts uses an unnecessary
f-string (no placeholders); remove the leading "f" so the message is a plain
string literal (update the logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.
backend/app/services/llm/mappers.py (1)

102-106: Use the already-validated model variable instead of re-fetching from dict.

Line 102 already extracts and validates model = kaapi_params.get("model"), but line 106 calls kaapi_params.get("model") again. Use the local variable for consistency and to avoid redundant dict lookups.

Proposed fix
     model = kaapi_params.get("model")
     if not model:
         return {}, ["Missing required 'model' parameter"]
 
-    google_params["model"] = kaapi_params.get("model")
+    google_params["model"] = model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 102 - 106, The code
validates model with model = kaapi_params.get("model") but then redundantly
re-reads kaapi_params.get("model") when setting google_params["model"]; update
that assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/models/llm/request.py`:
- Around line 69-76: The AudioContent model's mime_type is annotated as
non-nullable str but defaults to None; update the type annotation for the
mime_type field on the AudioContent class to allow None (e.g., make mime_type of
type str | None) so it matches the default and the downstream usage (see places
referencing query_input.content.mime_type or "audio/wav") and adjust the
Field(...) declaration to keep the same default None and description.

In `@backend/app/services/llm/providers/gai.py`:
- Line 222: The code uses an undefined symbol types.GenerateContentConfig
causing a NameError; replace the incorrect reference with the already-imported
GenerateContentConfig by changing the line that sets config (currently using
types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs),
ensuring the GenerateContentConfig symbol (imported at top) is used instead of
the nonexistent types module.
- Around line 239-275: The "Audio conversion successful" log is emitted even
when no conversion occurs; inside GoogleAIProvider._execute_tts, move or
duplicate the success log so it only runs after a real conversion, i.e., place
the logger.info call into the response_format == "mp3" branch after
convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after
convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg,
actual_format and converted_bytes to build the message), and remove or guard the
existing unconditional success log so unsupported-format fallback (where
response_format is reset to "wav") does not log success.

In `@backend/app/utils.py`:
- Around line 426-444: The temp-file leak occurs in resolve_audio_base64 when
base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by
tracking temp_path (initialize to None before the try), and in the outer except
for the write error check if temp_path is set and the file exists then remove it
(os.remove) before returning the error; ensure resolve_audio_base64 imports os
and sets temp_path only after tmp.name is assigned so cleanup is safe, or
alternatively use a try/finally around the NamedTemporaryFile write to always
unlink on error while still returning the error string.
- Around line 447-474: The function resolve_input is missing a type annotation
for its query_input parameter; update its signature to type-hint query_input as
the request module's QueryInput (or appropriate union of TextInput|AudioInput)
using a forward/string annotation or by importing QueryInput under a
TYPE_CHECKING guard to avoid circular imports, and keep the existing return type
tuple[str, str | None]; ensure references in the body (TextInput, AudioInput,
resolve_audio_base64) remain unchanged.

---

Duplicate comments:
In `@backend/app/crud/llm.py`:
- Around line 68-91: The code incorrectly uses getattr/hasattr on
completion_config.params which may be a dict; update logic in the
completion_type and model resolution to handle both dict and object shapes: for
completion_type, check if completion_config.params is a dict and use
completion_config.params.get("type", "text"), otherwise use
getattr(completion_config.params, "type", "text"); for model, similarly if
params is a dict use params.get("model", "") else use
getattr(completion_config.params, "model", completion_config.params.get("model",
"") if hasattr(completion_config.params, "get") else ""); adjust references to
completion_config, completion_type, model, and params accordingly so both
dict-based (NativeCompletionConfig) and object-based params work.

In `@backend/app/models/llm/request.py`:
- Around line 4-8: There are duplicate imports of Field and SQLModel (two
sqlmodel import lines); remove the redundant import and consolidate all symbols
imported from sqlmodel into a single import statement so only one import
supplies Field, SQLModel, Index, and text; keep the other imports
(Discriminator, model_validator, HttpUrl, datetime, JSONB) as-is and ensure no
name collisions remain in request.py.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 96-97: The uploaded Gemini file (assigned to gemini_file via
self.client.files.upload(file=resolved_input)) is never deleted, causing storage
accumulation; wrap the upload + generate_content flow in try/finally and in the
finally block call the provider client file-delete API (e.g.,
self.client.files.delete(...) using the gemini_file identifier or name) to
remove the uploaded artifact regardless of success/failure, ensuring any
exceptions from delete are caught/logged so cleanup does not mask the original
error.

---

Nitpick comments:
In `@backend/app/services/llm/mappers.py`:
- Around line 102-106: The code validates model with model =
kaapi_params.get("model") but then redundantly re-reads
kaapi_params.get("model") when setting google_params["model"]; update that
assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

In `@backend/app/services/llm/providers/gai.py`:
- Around line 283-286: The logger.warning call inside
GoogleAIProvider._execute_tts uses an unnecessary f-string (no placeholders);
remove the leading "f" so the message is a plain string literal (update the
logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

Comment on lines 69 to 76
class AudioContent(SQLModel):
format: Literal["base64"] = "base64"
value: str = Field(..., min_length=1, description="Base64 encoded audio")
value: str = Field(..., description="Base64 encoded audio")
# keeping the mime_type liberal here, since does not affect transcription type
mime_type: str | None = Field(
mime_type: str = Field(
None,
description="MIME type of the audio (e.g., audio/wav, audio/mp3, audio/ogg)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual code in request.py at the specified lines
cat -n backend/app/models/llm/request.py | sed -n '69,76p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 482


🏁 Script executed:

# Check how mime_type is used downstream, especially in utils.py
rg -n "mime_type" --type=py -C3

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 16516


🏁 Script executed:

# Specifically check the utils.py line 466 mentioned in the comment
if [ -f backend/app/utils.py ] || [ -f backend/utils.py ] || [ -f app/utils.py ]; then
  find . -name "utils.py" -path "*/backend/*" | head -1 | xargs sed -n '460,472p' -
fi

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Let's also check if there are any type checking or validation issues
# Check how AudioContent is instantiated or used
rg -n "AudioContent" --type=py -C2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3900


🏁 Script executed:

# Check SQLModel/Pydantic version and configuration
find . -name "pyproject.toml" -o -name "requirements.txt" | head -5 | xargs grep -l "sqlmodel\|pydantic" 2>/dev/null

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 99


🏁 Script executed:

# Check if there's any model validation configuration or model_config in the request.py file
cat -n backend/app/models/llm/request.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3402


🏁 Script executed:

# Check if there are any validation errors or issues reported in tests related to AudioContent
rg -n "AudioContent" --type=py backend/app/tests/ -C2 | grep -A5 -B5 "error\|exception\|validate"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1053


🏁 Script executed:

# Check the pyproject.toml to see Pydantic/SQLModel versions
cat backend/pyproject.toml | grep -A5 -B5 "pydantic\|sqlmodel"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 699


🏁 Script executed:

# Test if AudioContent can actually be instantiated with None for mime_type
# and check if there are any validation issues in the codebase
rg -n "AudioContent\(" --type=py | grep -v "mime_type=" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 229


🏁 Script executed:

# Check for any Pydantic ConfigDict or model_config settings that might affect validation
rg -n "model_config\|ConfigDict" --type=py backend/app/models/llm/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Let's check if there are any instances where AudioContent is created without mime_type explicitly
rg -n "AudioContent\(" --type=py -C1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1109


🏁 Script executed:

# Check if there's any instance where mime_type might be set to None after instantiation
rg -n "mime_type\s*=" --type=py -C2 | grep -v test | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2447


Fix type annotation for mime_type field to match actual usage pattern.

mime_type: str = Field(None, ...) declares a non-nullable str type but defaults to None. Pydantic v2 will accept None at instantiation, but the type annotation is misleading. Downstream code at utils.py line 466 explicitly handles the None case with query_input.content.mime_type or "audio/wav", confirming None is an expected value.

Change to: mime_type: str | None = Field(None, ...)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 69 - 76, The AudioContent
model's mime_type is annotated as non-nullable str but defaults to None; update
the type annotation for the mime_type field on the AudioContent class to allow
None (e.g., make mime_type of type str | None) so it matches the default and the
downstream usage (see places referencing query_input.content.mime_type or
"audio/wav") and adjust the Field(...) declaration to keep the same default None
and description.

if director_notes:
config_kwargs["system_instruction"] = director_notes

config = types.GenerateContentConfig(**config_kwargs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

types is undefined — this will crash at runtime with NameError.

Line 222 references types.GenerateContentConfig, but types is never imported. GenerateContentConfig is already imported directly on line 7. This is a guaranteed runtime crash for every TTS request.

Proposed fix
-        config = types.GenerateContentConfig(**config_kwargs)
+        config = GenerateContentConfig(**config_kwargs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config = types.GenerateContentConfig(**config_kwargs)
config = GenerateContentConfig(**config_kwargs)
🧰 Tools
🪛 Ruff (0.15.0)

[error] 222-222: Undefined name types

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` at line 222, The code uses an
undefined symbol types.GenerateContentConfig causing a NameError; replace the
incorrect reference with the already-imported GenerateContentConfig by changing
the line that sets config (currently using types.GenerateContentConfig) to use
GenerateContentConfig(**config_kwargs), ensuring the GenerateContentConfig
symbol (imported at top) is used instead of the nonexistent types module.

Comment on lines +239 to +275
# Post-process audio format conversion if needed
# Gemini TTS natively outputs 24kHz 16-bit PCM (WAV format)
actual_format = "wav" # Native Gemini TTS output format
encoded_content = base64.b64encode(raw_audio_bytes or b"").decode("ascii")

if response_format and response_format != "wav":
# Need to convert from WAV to requested format
logger.info(
f"[GoogleAIProvider._execute_tts] Converting audio from WAV to {response_format}"
)

if response_format == "mp3":
converted_bytes, convert_error = convert_pcm_to_mp3(raw_audio_bytes)
if convert_error:
return None, f"Failed to convert audio to MP3: {convert_error}"
encoded_content = base64.b64encode(converted_bytes or b"").decode(
"ascii"
)
actual_format = "mp3"

elif response_format == "ogg":
converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
if convert_error:
return None, f"Failed to convert audio to OGG: {convert_error}"
encoded_content = base64.b64encode(converted_bytes or b"").decode(
"ascii"
)
actual_format = "ogg"
else:
logger.warning(
f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
)
response_format = "wav"
logger.info(
f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
)
response_mime_type = f"audio/{response_format}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Audio conversion successful" log is misleading for the unsupported-format fallback path.

When response_format is unsupported (e.g., "flac"), line 268-271 logs a warning and resets to "wav", but the code then falls through to line 272-274 and logs "Audio conversion successful" even though no conversion occurred. Move the success log inside the mp3/ogg branches, or guard it.

Also, response_mime_type on line 275 now correctly uses response_format (which was reset to "wav" in the fallback), so the mime type issue from the prior review is resolved.

Proposed fix
             elif response_format == "ogg":
                 converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
                 if convert_error:
                     return None, f"Failed to convert audio to OGG: {convert_error}"
                 encoded_content = base64.b64encode(converted_bytes or b"").decode(
                     "ascii"
                 )
                 actual_format = "ogg"
             else:
                 logger.warning(
                     f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
                 )
                 response_format = "wav"
-            logger.info(
-                f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
-            )
+
+            if actual_format != "wav":
+                logger.info(
+                    f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
+                )
         response_mime_type = f"audio/{response_format}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 239 - 275, The "Audio
conversion successful" log is emitted even when no conversion occurs; inside
GoogleAIProvider._execute_tts, move or duplicate the success log so it only runs
after a real conversion, i.e., place the logger.info call into the
response_format == "mp3" branch after convert_pcm_to_mp3 succeeds and into the
response_format == "ogg" branch after convert_pcm_to_ogg succeeds (use
convert_pcm_to_mp3, convert_pcm_to_ogg, actual_format and converted_bytes to
build the message), and remove or guard the existing unconditional success log
so unsupported-format fallback (where response_format is reset to "wav") does
not log success.

Comment on lines +426 to +444
def resolve_audio_base64(data: str, mime_type: str) -> tuple[str, str | None]:
"""Decode base64 audio and write to temp file. Returns (file_path, error)."""
try:
audio_bytes = base64.b64decode(data)
except Exception as e:
return "", f"Invalid base64 audio data: {str(e)}"

ext = get_file_extension(mime_type)
try:
with tempfile.NamedTemporaryFile(
suffix=ext, delete=False, prefix="audio_"
) as tmp:
tmp.write(audio_bytes)
temp_path = tmp.name

logger.info(f"[resolve_audio_base64] Wrote audio to temp file: {temp_path}")
return temp_path, None
except Exception as e:
return "", f"Failed to write audio to temp file: {str(e)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Temp files are never cleaned up on the error path inside resolve_audio_base64.

If base64.b64decode succeeds but the subsequent NamedTemporaryFile write fails partway through, the temp file (created with delete=False) may be left behind. The current except on line 443 returns an error string, but any partially-created file is leaked. This is a minor concern since NamedTemporaryFile creation and a single write are unlikely to fail partially, but worth noting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 426 - 444, The temp-file leak occurs in
resolve_audio_base64 when base64 decoding succeeds but writing to
NamedTemporaryFile fails; fix by tracking temp_path (initialize to None before
the try), and in the outer except for the write error check if temp_path is set
and the file exists then remove it (os.remove) before returning the error;
ensure resolve_audio_base64 imports os and sets temp_path only after tmp.name is
assigned so cleanup is safe, or alternatively use a try/finally around the
NamedTemporaryFile write to always unlink on error while still returning the
error string.

Comment on lines 447 to 474
def resolve_input(query_input) -> tuple[str, str | None]:
"""Resolve discriminated union input to content string.

Args:
query_input: The input from QueryParams (TextInput or AudioInput)

Returns:
(content_string, None) on success - for text returns content value, for audio returns temp file path
("", error_message) on failure
"""
# Import here to avoid circular imports
from app.models.llm.request import TextInput, AudioInput

try:
if isinstance(query_input, TextInput):
return query_input.content.value, None

elif isinstance(query_input, AudioInput):
# AudioInput content is base64-encoded audio
mime_type = query_input.content.mime_type or "audio/wav"
return resolve_audio_base64(query_input.content.value, mime_type)

else:
return "", f"Unknown input type: {type(query_input)}"

except Exception as e:
logger.error(f"[resolve_input] Failed to resolve input: {e}", exc_info=True)
return "", f"Failed to resolve input: {str(e)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Missing type hint on query_input parameter.

Per coding guidelines, all function parameters must have type hints. Since the local import already brings in TextInput and AudioInput, the annotation should use QueryInput (or the union type) from the request module. You can use a string annotation or TYPE_CHECKING to avoid circular imports at runtime.

Proposed fix
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from app.models.llm.request import QueryInput
+
 ...
 
-def resolve_input(query_input) -> tuple[str, str | None]:
+def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:

Alternatively, if from __future__ import annotations would be too broad a change for this file, use a string literal:

-def resolve_input(query_input) -> tuple[str, str | None]:
+def resolve_input(query_input: "TextInput | AudioInput") -> tuple[str, str | None]:

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 447 - 474, The function resolve_input is
missing a type annotation for its query_input parameter; update its signature to
type-hint query_input as the request module's QueryInput (or appropriate union
of TextInput|AudioInput) using a forward/string annotation or by importing
QueryInput under a TYPE_CHECKING guard to avoid circular imports, and keep the
existing return type tuple[str, str | None]; ensure references in the body
(TextInput, AudioInput, resolve_audio_base64) remain unchanged.

job = session.get(Job, job_id)
if not job:
# Log all jobs to see what's in the database
from sqlmodel import select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajna1999 this comment was not addressed

return load_description(Path(filename))


# Input resolver functions moved from app.services.llm.input_resolver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion, the common utils file is supposed to have generic functions and all the audio related functions here seem to domain specific, but if you and other people think that this is fine then i guess its okay for now

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/llm/jobs.py (1)

446-453: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model where str is expected.

APIResponse.failure_response has error: str | list. Passing the QueryInput Pydantic object will either cause a Pydantic v2 coercion to its str() representation (producing a useless internal model dump as the error message) or a validation error. The analogous block in the input-guardrails path (dead code at line 189) correctly passes safe_input["data"]["safe_text"].

🐛 Proposed fix
-                        callback_response = APIResponse.failure_response(
-                            error=request.query.input,
-                            metadata=request.request_metadata,
-                        )
+                        callback_response = APIResponse.failure_response(
+                            error=safe_output["data"]["safe_text"],
+                            metadata=request.request_metadata,
+                        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 446 - 453, The failure branch
uses APIResponse.failure_response(error=request.query.input) which passes a
Pydantic QueryInput object where a string is expected; change it to pass the
safe text string (like safe_input["data"]["safe_text"] used in the
input-guardrails path) so error is a plain str or list of str, then call
handle_job_error(job_id, request.callback_url, callback_response) as before;
update the block that checks safe_output["data"]["rephrase_needed"] to build
callback_response with error set to the safe text rather than
request.query.input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 169-204: Remove the unreachable pre-session guardrails block that
begins with the "if input_guardrails:" check (the code that calls
call_guardrails(...) and manipulates safe_input/request.query.input) because
input_guardrails is only populated later in the session; instead rely on the
existing working guardrails implementation already present later (the
session-based block that uses list_validators_config and call_guardrails).
Delete the entire pre-session block to eliminate dead code and the undefined
symbol input_query and ensure only the session-based guardrail handling is
executed.
- Around line 411-412: The isinstance check is using the imported function name
resolve_input instead of the resolved value; replace the check to test the
resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and
call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned
up. Locate the block around the resolve_input usage in jobs.py where
resolved_input is set and update the conditional to reference resolved_input and
pass that object to cleanup_temp_file.
- Around line 22-24: Remove the stray closing parenthesis and the stale import:
delete the lone ")" on line 24 and replace the import of call_guardrails with
the correct active guardrails symbol; remove or change the import statement
referencing call_guardrails and ensure the file only imports
run_guardrails_validation (the function actually used in the active guardrails
path) so that the dead pre-session reference to call_guardrails is eliminated
and the module parses cleanly.
- Around line 466-475: Remove the orphaned redundant else block that assigns
response.response.output.content.value = safe_output["error"] and constructs
callback_response using APIResponse.failure_response, then calls
handle_job_error; this block references undefined safe_output and duplicates
logic already handled in the inner else, so delete that entire orphaned else
section (the duplicated assignment and return calling handle_job_error) to
restore valid Python syntax and avoid NameError while keeping the existing error
handling in the inner else branch intact.

---

Outside diff comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 446-453: The failure branch uses
APIResponse.failure_response(error=request.query.input) which passes a Pydantic
QueryInput object where a string is expected; change it to pass the safe text
string (like safe_input["data"]["safe_text"] used in the input-guardrails path)
so error is a plain str or list of str, then call handle_job_error(job_id,
request.callback_url, callback_response) as before; update the block that checks
safe_output["data"]["rephrase_needed"] to build callback_response with error set
to the safe text rather than request.query.input.

---

Duplicate comments:
In `@backend/app/models/llm/request.py`:
- Around line 4-8: Remove the duplicate imports of Field and SQLModel by
consolidating into a single import statement that includes all sqlmodel symbols
used (Field, SQLModel, Index, text) and keep the other imports (Discriminator,
model_validator, HttpUrl, datetime, JSONB) separate; update the import block so
only one "from sqlmodel import ..." line exists to eliminate the F811 duplicate
import for Field/SQLModel.
- Around line 73-76: The mime_type field's type annotation disagrees with its
None default — update the annotation for mime_type back to Optional[str] (and
import Optional from typing if missing) so the Field(None, ...) default is
valid; ensure downstream usage like query_input.content.mime_type or "audio/wav"
continues to work with a nullable mime_type.

In `@backend/app/services/llm/jobs.py`:
- Around line 209-221: The code contains leftover debug scaffolding after the
session.get(Job, job_id) lookup: remove the inline import of select, the
recent-jobs query (all_jobs), and the verbose "Recent jobs in DB" logger entry;
instead, keep a concise error log in the execute_job flow (the logger.error
following the session.get(Job, job_id) check) that only reports job_id and a
clear message that the job was not found, without querying or printing other DB
rows; ensure any removal preserves control flow around
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}") and the
subsequent handling of the missing job.

Comment on lines 22 to 24
)
from app.services.llm.guardrails import call_guardrails
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stray ) on line 24 causes a parse error — CI is failing.

The from app.models.llm.request import (...) block closes correctly on line 22. Line 23 adds a standalone call_guardrails import, then line 24 has an orphaned ) that makes the file unparseable (confirmed by the CI failure: Cannot parse file: 24:0: )).

Additionally, call_guardrails (line 23) is stale: the dead pre-session block references it (line 175), while the active guardrails path uses run_guardrails_validation (already imported via lines 26–29).

🐛 Proposed fix
     TextInput,
     AudioInput,
 )
-from app.services.llm.guardrails import call_guardrails
-)
 from app.models.llm.response import TextOutput
🧰 Tools
🪛 GitHub Actions: Kaapi CI

[error] 24-24: Black formatting failed. Cannot parse file: 24:0: ).

🪛 Ruff (0.15.1)

[warning] 24-24: Expected a statement

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 22 - 24, Remove the stray
closing parenthesis and the stale import: delete the lone ")" on line 24 and
replace the import of call_guardrails with the correct active guardrails symbol;
remove or change the import statement referencing call_guardrails and ensure the
file only imports run_guardrails_validation (the function actually used in the
active guardrails path) so that the dead pre-session reference to
call_guardrails is eliminated and the module parses cleanly.

Comment on lines 169 to 204
if input_guardrails:
if not isinstance(request.query.input, TextInput):
logger.warning(
"[execute_job] Input guardrails skipped for non-text input type"
)
else:
safe_input = call_guardrails(input_query, input_guardrails, job_id)

logger.info(
f"[execute_job] Input guardrail validation | success={safe_input['success']}."
)

if safe_input.get("bypassed"):
logger.info("[execute_job] Guardrails bypassed (service unavailable)")

elif safe_input["success"]:
# Update the text value within the QueryInput structure
request.query.input.content.value = safe_input["data"]["safe_text"]

if safe_input["data"]["rephrase_needed"]:
callback_response = APIResponse.failure_response(
error=safe_input["data"]["safe_text"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
else:
# Update the text value with error message
request.query.input.content.value = safe_input["error"]

callback_response = APIResponse.failure_response(
error=safe_input["error"],
metadata=request.request_metadata,
)
return handle_job_error(job_id, request.callback_url, callback_response)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dead pre-session guardrails block — unreachable and references undefined input_query.

At line 169, input_guardrails is still [] (initialized at line 160); the actual population via list_validators_config happens at lines 253–259 inside the session. The if input_guardrails: check is therefore always False and this entire block is unreachable.

Beyond being dead code, it has two additional defects that would surface if it ever ran:

  • Line 175: call_guardrails(input_query, ...)input_query is never defined in this scope (NameError).
  • Lines 177–178: safe_input['success'] is accessed outside the TextInput branch, so safe_input would be undefined for non-text inputs.

The actual working implementation is at lines 261–300. Remove lines 169–204 entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 169 - 204, Remove the
unreachable pre-session guardrails block that begins with the "if
input_guardrails:" check (the code that calls call_guardrails(...) and
manipulates safe_input/request.query.input) because input_guardrails is only
populated later in the session; instead rely on the existing working guardrails
implementation already present later (the session-based block that uses
list_validators_config and call_guardrails). Delete the entire pre-session block
to eliminate dead code and the undefined symbol input_query and ensure only the
session-based guardrail handling is executed.

Comment on lines 411 to 412
if resolved_input and isinstance(resolve_input, AudioInput):
cleanup_temp_file(resolved_input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Temp-file cleanup never executes — resolve_input is the imported function, not the input object.

resolve_input on line 411 refers to the resolve_input function imported at line 32, not the resolved string resolved_input or the request's input object. isinstance(<function>, AudioInput) is always False, so cleanup_temp_file is never called and every audio request leaks a temp file. The past review fix targeted this exact line but the wrong variable was used.

🐛 Proposed fix
-            if resolved_input and isinstance(resolve_input, AudioInput):
+            if resolved_input and isinstance(request.query.input, AudioInput):
                 cleanup_temp_file(resolved_input)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 411 - 412, The isinstance
check is using the imported function name resolve_input instead of the resolved
value; replace the check to test the resolved_input variable (i.e., use
isinstance(resolved_input, AudioInput)) and call
cleanup_temp_file(resolved_input) so audio temp files are actually cleaned up.
Locate the block around the resolve_input usage in jobs.py where resolved_input
is set and update the conditional to reference resolved_input and pass that
object to cleanup_temp_file.

Comment on lines 466 to 475
else:
response.response.output.content.value = safe_output["error"]

callback_response = APIResponse.failure_response(
error=safe_output["error"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Orphaned else: block is invalid Python syntax and references undefined safe_output.

Line 466 introduces a second else: at the same indentation (8 spaces) as line 421's else:, which already closes the if not isinstance(response.response.output, TextOutput): conditional. Python allows only one else per conditional, making this a syntax error confirmed by Ruff (lines 466–467: "Expected a statement").

Additionally, if this block somehow executed (e.g., in the not isinstance branch), safe_output (set at line 423 inside the else at line 421) would be undefined, causing a NameError. This block is a copy-paste artifact of the inner else: at line 455 and should be removed entirely.

🐛 Proposed fix — remove the orphaned block
                     else:
                         response.response.output.content.value = safe_output["error"]

                         callback_response = APIResponse.failure_response(
                             error=safe_output["error"],
                             metadata=request.request_metadata,
                         )
                         return handle_job_error(
                             job_id, request.callback_url, callback_response
                         )

-            else:
-                response.response.output.content.value = safe_output["error"]
-
-                callback_response = APIResponse.failure_response(
-                    error=safe_output["error"],
-                    metadata=request.request_metadata,
-                )
-                return handle_job_error(
-                    job_id, request.callback_url, callback_response
-                )

         callback_response = APIResponse.success_response(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
response.response.output.content.value = safe_output["error"]
callback_response = APIResponse.failure_response(
error=safe_output["error"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 466-466: Expected a statement

(invalid-syntax)


[warning] 466-466: Expected a statement

(invalid-syntax)


[warning] 466-467: Expected a statement

(invalid-syntax)


[warning] 467-467: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 466 - 475, Remove the orphaned
redundant else block that assigns response.response.output.content.value =
safe_output["error"] and constructs callback_response using
APIResponse.failure_response, then calls handle_job_error; this block references
undefined safe_output and duplicates logic already handled in the inner else, so
delete that entire orphaned else section (the duplicated assignment and return
calling handle_job_error) to restore valid Python syntax and avoid NameError
while keeping the existing error handling in the inner else branch intact.

@Prajna1999 Prajna1999 moved this to In Progress in Kaapi-dev Feb 19, 2026
@Prajna1999 Prajna1999 moved this from In Progress to In Review in Kaapi-dev Feb 19, 2026
@Prajna1999 Prajna1999 merged commit 8c77e17 into main Feb 23, 2026
2 checks passed
@Prajna1999 Prajna1999 deleted the feature/unified-api-tts branch February 23, 2026 17:06
@github-project-automation github-project-automation bot moved this from In Review to Closed in Kaapi-dev Feb 23, 2026
@Prajna1999 Prajna1999 linked an issue Feb 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

CI: Add ffmpeg as a dependency before running tests on CI

4 participants