TTS/STT: Gemini-TTS Model Integration with unified API by Prajna1999 · Pull Request #574 · ProjectTech4DevAI/kaapi-backend

Prajna1999 · 2026-02-02T15:59:10Z

Summary

Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Currently tested with only Hindi and English as the output language.
Ad hoc main function to test tts scripts will be deleted before final push.

Summary by CodeRabbit

New Features
- Added Text-to-Speech support (Google Gemini) with configurable model, voice, language, and output formats (WAV/MP3/OGG).
- Audio conversion options (MP3, OGG) with format selection and MIME-aware outputs.
Improvements
- Better audio input validation and accurate audio size calculations.
- Consolidated input resolution and temporary-audio handling for text and audio inputs.
Chores
- Added pydub dependency for audio processing.
Tests
- Extensive STT and TTS test coverage added.

…-stt-new' into feature/unified-api-stt-new merge upstream

…tion config

…test for now

…ssistant_id handling

…-tts' into feature/unified-api-tts

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

backend/app/services/llm/providers/gai.py (1)

283-286: f-string without placeholders on line 285.

Static analysis (Ruff F541) flags this. Remove the f prefix.

Proposed fix

-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 283 - 286, The
logger.warning call inside GoogleAIProvider._execute_tts uses an unnecessary
f-string (no placeholders); remove the leading "f" so the message is a plain
string literal (update the logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

backend/app/services/llm/mappers.py (1)

102-106: Use the already-validated model variable instead of re-fetching from dict.

Line 102 already extracts and validates model = kaapi_params.get("model"), but line 106 calls kaapi_params.get("model") again. Use the local variable for consistency and to avoid redundant dict lookups.

Proposed fix

     model = kaapi_params.get("model")
     if not model:
         return {}, ["Missing required 'model' parameter"]
 
-    google_params["model"] = kaapi_params.get("model")
+    google_params["model"] = model

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 102 - 106, The code
validates model with model = kaapi_params.get("model") but then redundantly
re-reads kaapi_params.get("model") when setting google_params["model"]; update
that assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/models/llm/request.py`:
- Around line 69-76: The AudioContent model's mime_type is annotated as
non-nullable str but defaults to None; update the type annotation for the
mime_type field on the AudioContent class to allow None (e.g., make mime_type of
type str | None) so it matches the default and the downstream usage (see places
referencing query_input.content.mime_type or "audio/wav") and adjust the
Field(...) declaration to keep the same default None and description.

In `@backend/app/services/llm/providers/gai.py`:
- Line 222: The code uses an undefined symbol types.GenerateContentConfig
causing a NameError; replace the incorrect reference with the already-imported
GenerateContentConfig by changing the line that sets config (currently using
types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs),
ensuring the GenerateContentConfig symbol (imported at top) is used instead of
the nonexistent types module.
- Around line 239-275: The "Audio conversion successful" log is emitted even
when no conversion occurs; inside GoogleAIProvider._execute_tts, move or
duplicate the success log so it only runs after a real conversion, i.e., place
the logger.info call into the response_format == "mp3" branch after
convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after
convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg,
actual_format and converted_bytes to build the message), and remove or guard the
existing unconditional success log so unsupported-format fallback (where
response_format is reset to "wav") does not log success.

In `@backend/app/utils.py`:
- Around line 426-444: The temp-file leak occurs in resolve_audio_base64 when
base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by
tracking temp_path (initialize to None before the try), and in the outer except
for the write error check if temp_path is set and the file exists then remove it
(os.remove) before returning the error; ensure resolve_audio_base64 imports os
and sets temp_path only after tmp.name is assigned so cleanup is safe, or
alternatively use a try/finally around the NamedTemporaryFile write to always
unlink on error while still returning the error string.
- Around line 447-474: The function resolve_input is missing a type annotation
for its query_input parameter; update its signature to type-hint query_input as
the request module's QueryInput (or appropriate union of TextInput|AudioInput)
using a forward/string annotation or by importing QueryInput under a
TYPE_CHECKING guard to avoid circular imports, and keep the existing return type
tuple[str, str | None]; ensure references in the body (TextInput, AudioInput,
resolve_audio_base64) remain unchanged.

---

Duplicate comments:
In `@backend/app/crud/llm.py`:
- Around line 68-91: The code incorrectly uses getattr/hasattr on
completion_config.params which may be a dict; update logic in the
completion_type and model resolution to handle both dict and object shapes: for
completion_type, check if completion_config.params is a dict and use
completion_config.params.get("type", "text"), otherwise use
getattr(completion_config.params, "type", "text"); for model, similarly if
params is a dict use params.get("model", "") else use
getattr(completion_config.params, "model", completion_config.params.get("model",
"") if hasattr(completion_config.params, "get") else ""); adjust references to
completion_config, completion_type, model, and params accordingly so both
dict-based (NativeCompletionConfig) and object-based params work.

In `@backend/app/models/llm/request.py`:
- Around line 4-8: There are duplicate imports of Field and SQLModel (two
sqlmodel import lines); remove the redundant import and consolidate all symbols
imported from sqlmodel into a single import statement so only one import
supplies Field, SQLModel, Index, and text; keep the other imports
(Discriminator, model_validator, HttpUrl, datetime, JSONB) as-is and ensure no
name collisions remain in request.py.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 96-97: The uploaded Gemini file (assigned to gemini_file via
self.client.files.upload(file=resolved_input)) is never deleted, causing storage
accumulation; wrap the upload + generate_content flow in try/finally and in the
finally block call the provider client file-delete API (e.g.,
self.client.files.delete(...) using the gemini_file identifier or name) to
remove the uploaded artifact regardless of success/failure, ensuring any
exceptions from delete are caught/logged so cleanup does not mask the original
error.

---

Nitpick comments:
In `@backend/app/services/llm/mappers.py`:
- Around line 102-106: The code validates model with model =
kaapi_params.get("model") but then redundantly re-reads
kaapi_params.get("model") when setting google_params["model"]; update that
assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

In `@backend/app/services/llm/providers/gai.py`:
- Around line 283-286: The logger.warning call inside
GoogleAIProvider._execute_tts uses an unnecessary f-string (no placeholders);
remove the leading "f" so the message is a plain string literal (update the
logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/models/llm/request.py

 class AudioContent(SQLModel):
    format: Literal["base64"] = "base64"
-    value: str = Field(..., min_length=1, description="Base64 encoded audio")
+    value: str = Field(..., description="Base64 encoded audio")
    # keeping the mime_type liberal here, since does not affect transcription type
-    mime_type: str | None = Field(
+    mime_type: str = Field(
        None,
        description="MIME type of the audio (e.g., audio/wav, audio/mp3, audio/ogg)",
    )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual code in request.py at the specified lines cat -n backend/app/models/llm/request.py | sed -n '69,76p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 482

🏁 Script executed:

# Check how mime_type is used downstream, especially in utils.py rg -n "mime_type" --type=py -C3

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 16516

🏁 Script executed:

# Specifically check the utils.py line 466 mentioned in the comment if [ -f backend/app/utils.py ] || [ -f backend/utils.py ] || [ -f app/utils.py ]; then find . -name "utils.py" -path "*/backend/*" | head -1 | xargs sed -n '460,472p' - fi

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Let's also check if there are any type checking or validation issues # Check how AudioContent is instantiated or used rg -n "AudioContent" --type=py -C2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3900

🏁 Script executed:

# Check SQLModel/Pydantic version and configuration find . -name "pyproject.toml" -o -name "requirements.txt" | head -5 | xargs grep -l "sqlmodel\|pydantic" 2>/dev/null

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 99

🏁 Script executed:

# Check if there's any model validation configuration or model_config in the request.py file cat -n backend/app/models/llm/request.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3402

🏁 Script executed:

# Check if there are any validation errors or issues reported in tests related to AudioContent rg -n "AudioContent" --type=py backend/app/tests/ -C2 | grep -A5 -B5 "error\|exception\|validate"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1053

🏁 Script executed:

# Check the pyproject.toml to see Pydantic/SQLModel versions cat backend/pyproject.toml | grep -A5 -B5 "pydantic\|sqlmodel"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 699

🏁 Script executed:

# Test if AudioContent can actually be instantiated with None for mime_type # and check if there are any validation issues in the codebase rg -n "AudioContent\(" --type=py | grep -v "mime_type=" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 229

🏁 Script executed:

# Check for any Pydantic ConfigDict or model_config settings that might affect validation rg -n "model_config\|ConfigDict" --type=py backend/app/models/llm/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Let's check if there are any instances where AudioContent is created without mime_type explicitly rg -n "AudioContent\(" --type=py -C1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1109

🏁 Script executed:

# Check if there's any instance where mime_type might be set to None after instantiation rg -n "mime_type\s*=" --type=py -C2 | grep -v test | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2447

Fix type annotation for mime_type field to match actual usage pattern.

mime_type: str = Field(None, ...) declares a non-nullable str type but defaults to None. Pydantic v2 will accept None at instantiation, but the type annotation is misleading. Downstream code at utils.py line 466 explicitly handles the None case with query_input.content.mime_type or "audio/wav", confirming None is an expected value.

Change to: mime_type: str | None = Field(None, ...)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 69 - 76, The AudioContent model's mime_type is annotated as non-nullable str but defaults to None; update the type annotation for the mime_type field on the AudioContent class to allow None (e.g., make mime_type of type str | None) so it matches the default and the downstream usage (see places referencing query_input.content.mime_type or "audio/wav") and adjust the Field(...) declaration to keep the same default None and description.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/services/llm/providers/gai.py

+        if director_notes:
+            config_kwargs["system_instruction"] = director_notes
+
+        config = types.GenerateContentConfig(**config_kwargs)


⚠️ Potential issue | 🔴 Critical

types is undefined — this will crash at runtime with NameError.

Line 222 references types.GenerateContentConfig, but types is never imported. GenerateContentConfig is already imported directly on line 7. This is a guaranteed runtime crash for every TTS request.

Proposed fix

- config = types.GenerateContentConfig(**config_kwargs) + config = GenerateContentConfig(**config_kwargs)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

config = types.GenerateContentConfig(**config_kwargs)

config = GenerateContentConfig(**config_kwargs)

🧰 Tools

🪛 Ruff (0.15.0)

[error] 222-222: Undefined name types

(F821)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` at line 222, The code uses an undefined symbol types.GenerateContentConfig causing a NameError; replace the incorrect reference with the already-imported GenerateContentConfig by changing the line that sets config (currently using types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs), ensuring the GenerateContentConfig symbol (imported at top) is used instead of the nonexistent types module.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/services/llm/providers/gai.py

+        # Post-process audio format conversion if needed
+        # Gemini TTS natively outputs 24kHz 16-bit PCM (WAV format)
+        actual_format = "wav"  # Native Gemini TTS output format
+        encoded_content = base64.b64encode(raw_audio_bytes or b"").decode("ascii")
+
+        if response_format and response_format != "wav":
+            # Need to convert from WAV to requested format
+            logger.info(
+                f"[GoogleAIProvider._execute_tts] Converting audio from WAV to {response_format}"
+            )
+
+            if response_format == "mp3":
+                converted_bytes, convert_error = convert_pcm_to_mp3(raw_audio_bytes)
+                if convert_error:
+                    return None, f"Failed to convert audio to MP3: {convert_error}"
+                encoded_content = base64.b64encode(converted_bytes or b"").decode(
+                    "ascii"
+                )
+                actual_format = "mp3"
+
+            elif response_format == "ogg":
+                converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
+                if convert_error:
+                    return None, f"Failed to convert audio to OGG: {convert_error}"
+                encoded_content = base64.b64encode(converted_bytes or b"").decode(
+                    "ascii"
+                )
+                actual_format = "ogg"
+            else:
+                logger.warning(
+                    f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
+                )
+                response_format = "wav"
+            logger.info(
+                f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
+            )
+        response_mime_type = f"audio/{response_format}"


⚠️ Potential issue | 🟡 Minor

"Audio conversion successful" log is misleading for the unsupported-format fallback path.

When response_format is unsupported (e.g., "flac"), line 268-271 logs a warning and resets to "wav", but the code then falls through to line 272-274 and logs "Audio conversion successful" even though no conversion occurred. Move the success log inside the mp3/ogg branches, or guard it.

Also, response_mime_type on line 275 now correctly uses response_format (which was reset to "wav" in the fallback), so the mime type issue from the prior review is resolved.

Proposed fix

elif response_format == "ogg": converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes) if convert_error: return None, f"Failed to convert audio to OGG: {convert_error}" encoded_content = base64.b64encode(converted_bytes or b"").decode( "ascii" ) actual_format = "ogg" else: logger.warning( f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV" ) response_format = "wav" - logger.info( - f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)" - ) + + if actual_format != "wav": + logger.info( + f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)" + ) response_mime_type = f"audio/{response_format}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 239 - 275, The "Audio conversion successful" log is emitted even when no conversion occurs; inside GoogleAIProvider._execute_tts, move or duplicate the success log so it only runs after a real conversion, i.e., place the logger.info call into the response_format == "mp3" branch after convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg, actual_format and converted_bytes to build the message), and remove or guard the existing unconditional success log so unsupported-format fallback (where response_format is reset to "wav") does not log success.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/utils.py

+def resolve_audio_base64(data: str, mime_type: str) -> tuple[str, str | None]:
+    """Decode base64 audio and write to temp file. Returns (file_path, error)."""
+    try:
+        audio_bytes = base64.b64decode(data)
+    except Exception as e:
+        return "", f"Invalid base64 audio data: {str(e)}"
+
+    ext = get_file_extension(mime_type)
+    try:
+        with tempfile.NamedTemporaryFile(
+            suffix=ext, delete=False, prefix="audio_"
+        ) as tmp:
+            tmp.write(audio_bytes)
+            temp_path = tmp.name
+
+        logger.info(f"[resolve_audio_base64] Wrote audio to temp file: {temp_path}")
+        return temp_path, None
+    except Exception as e:
+        return "", f"Failed to write audio to temp file: {str(e)}"


⚠️ Potential issue | 🟡 Minor

Temp files are never cleaned up on the error path inside resolve_audio_base64.

If base64.b64decode succeeds but the subsequent NamedTemporaryFile write fails partway through, the temp file (created with delete=False) may be left behind. The current except on line 443 returns an error string, but any partially-created file is leaked. This is a minor concern since NamedTemporaryFile creation and a single write are unlikely to fail partially, but worth noting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 426 - 444, The temp-file leak occurs in resolve_audio_base64 when base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by tracking temp_path (initialize to None before the try), and in the outer except for the write error check if temp_path is set and the file exists then remove it (os.remove) before returning the error; ensure resolve_audio_base64 imports os and sets temp_path only after tmp.name is assigned so cleanup is safe, or alternatively use a try/finally around the NamedTemporaryFile write to always unlink on error while still returning the error string.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/utils.py

+def resolve_input(query_input) -> tuple[str, str | None]:
+    """Resolve discriminated union input to content string.
+
+    Args:
+        query_input: The input from QueryParams (TextInput or AudioInput)
+
+    Returns:
+        (content_string, None) on success - for text returns content value, for audio returns temp file path
+        ("", error_message) on failure
+    """
+    # Import here to avoid circular imports
+    from app.models.llm.request import TextInput, AudioInput
+
+    try:
+        if isinstance(query_input, TextInput):
+            return query_input.content.value, None
+
+        elif isinstance(query_input, AudioInput):
+            # AudioInput content is base64-encoded audio
+            mime_type = query_input.content.mime_type or "audio/wav"
+            return resolve_audio_base64(query_input.content.value, mime_type)
+
+        else:
+            return "", f"Unknown input type: {type(query_input)}"
+
+    except Exception as e:
+        logger.error(f"[resolve_input] Failed to resolve input: {e}", exc_info=True)
+        return "", f"Failed to resolve input: {str(e)}"


🛠️ Refactor suggestion | 🟠 Major

Missing type hint on query_input parameter.

Per coding guidelines, all function parameters must have type hints. Since the local import already brings in TextInput and AudioInput, the annotation should use QueryInput (or the union type) from the request module. You can use a string annotation or TYPE_CHECKING to avoid circular imports at runtime.

Proposed fix

+from __future__ import annotations +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from app.models.llm.request import QueryInput + ... -def resolve_input(query_input) -> tuple[str, str | None]: +def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:

Alternatively, if from __future__ import annotations would be too broad a change for this file, use a string literal:

-def resolve_input(query_input) -> tuple[str, str | None]: +def resolve_input(query_input: "TextInput | AudioInput") -> tuple[str, str | None]:

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 447 - 474, The function resolve_input is missing a type annotation for its query_input parameter; update its signature to type-hint query_input as the request module's QueryInput (or appropriate union of TextInput|AudioInput) using a forward/string annotation or by importing QueryInput under a TYPE_CHECKING guard to avoid circular imports, and keep the existing return type tuple[str, str | None]; ensure references in the body (TextInput, AudioInput, resolve_audio_base64) remain unchanged.

nishika26 · 2026-02-18T07:06:21Z

backend/app/services/llm/jobs.py

+            job = session.get(Job, job_id)
+            if not job:
+                # Log all jobs to see what's in the database
+                from sqlmodel import select


@Prajna1999 this comment was not addressed

nishika26 · 2026-02-18T08:25:08Z

backend/app/utils.py

    return load_description(Path(filename))
+
+
+# Input resolver functions moved from app.services.llm.input_resolver


in my opinion, the common utils file is supposed to have generic functions and all the audio related functions here seem to domain specific, but if you and other people think that this is fine then i guess its okay for now

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/llm/jobs.py (1)

446-453: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model where str is expected.

APIResponse.failure_response has error: str | list. Passing the QueryInput Pydantic object will either cause a Pydantic v2 coercion to its str() representation (producing a useless internal model dump as the error message) or a validation error. The analogous block in the input-guardrails path (dead code at line 189) correctly passes safe_input["data"]["safe_text"].

🐛 Proposed fix

-                        callback_response = APIResponse.failure_response(
-                            error=request.query.input,
-                            metadata=request.request_metadata,
-                        )
+                        callback_response = APIResponse.failure_response(
+                            error=safe_output["data"]["safe_text"],
+                            metadata=request.request_metadata,
+                        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 446 - 453, The failure branch
uses APIResponse.failure_response(error=request.query.input) which passes a
Pydantic QueryInput object where a string is expected; change it to pass the
safe text string (like safe_input["data"]["safe_text"] used in the
input-guardrails path) so error is a plain str or list of str, then call
handle_job_error(job_id, request.callback_url, callback_response) as before;
update the block that checks safe_output["data"]["rephrase_needed"] to build
callback_response with error set to the safe text rather than
request.query.input.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 169-204: Remove the unreachable pre-session guardrails block that
begins with the "if input_guardrails:" check (the code that calls
call_guardrails(...) and manipulates safe_input/request.query.input) because
input_guardrails is only populated later in the session; instead rely on the
existing working guardrails implementation already present later (the
session-based block that uses list_validators_config and call_guardrails).
Delete the entire pre-session block to eliminate dead code and the undefined
symbol input_query and ensure only the session-based guardrail handling is
executed.
- Around line 411-412: The isinstance check is using the imported function name
resolve_input instead of the resolved value; replace the check to test the
resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and
call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned
up. Locate the block around the resolve_input usage in jobs.py where
resolved_input is set and update the conditional to reference resolved_input and
pass that object to cleanup_temp_file.
- Around line 22-24: Remove the stray closing parenthesis and the stale import:
delete the lone ")" on line 24 and replace the import of call_guardrails with
the correct active guardrails symbol; remove or change the import statement
referencing call_guardrails and ensure the file only imports
run_guardrails_validation (the function actually used in the active guardrails
path) so that the dead pre-session reference to call_guardrails is eliminated
and the module parses cleanly.
- Around line 466-475: Remove the orphaned redundant else block that assigns
response.response.output.content.value = safe_output["error"] and constructs
callback_response using APIResponse.failure_response, then calls
handle_job_error; this block references undefined safe_output and duplicates
logic already handled in the inner else, so delete that entire orphaned else
section (the duplicated assignment and return calling handle_job_error) to
restore valid Python syntax and avoid NameError while keeping the existing error
handling in the inner else branch intact.

---

Outside diff comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 446-453: The failure branch uses
APIResponse.failure_response(error=request.query.input) which passes a Pydantic
QueryInput object where a string is expected; change it to pass the safe text
string (like safe_input["data"]["safe_text"] used in the input-guardrails path)
so error is a plain str or list of str, then call handle_job_error(job_id,
request.callback_url, callback_response) as before; update the block that checks
safe_output["data"]["rephrase_needed"] to build callback_response with error set
to the safe text rather than request.query.input.

---

Duplicate comments:
In `@backend/app/models/llm/request.py`:
- Around line 4-8: Remove the duplicate imports of Field and SQLModel by
consolidating into a single import statement that includes all sqlmodel symbols
used (Field, SQLModel, Index, text) and keep the other imports (Discriminator,
model_validator, HttpUrl, datetime, JSONB) separate; update the import block so
only one "from sqlmodel import ..." line exists to eliminate the F811 duplicate
import for Field/SQLModel.
- Around line 73-76: The mime_type field's type annotation disagrees with its
None default — update the annotation for mime_type back to Optional[str] (and
import Optional from typing if missing) so the Field(None, ...) default is
valid; ensure downstream usage like query_input.content.mime_type or "audio/wav"
continues to work with a nullable mime_type.

In `@backend/app/services/llm/jobs.py`:
- Around line 209-221: The code contains leftover debug scaffolding after the
session.get(Job, job_id) lookup: remove the inline import of select, the
recent-jobs query (all_jobs), and the verbose "Recent jobs in DB" logger entry;
instead, keep a concise error log in the execute_job flow (the logger.error
following the session.get(Job, job_id) check) that only reports job_id and a
clear message that the job was not found, without querying or printing other DB
rows; ensure any removal preserves control flow around
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}") and the
subsequent handling of the missing job.

coderabbitai · 2026-02-19T04:29:52Z

backend/app/services/llm/jobs.py

+)
+from app.services.llm.guardrails import call_guardrails
 )


⚠️ Potential issue | 🔴 Critical

Stray ) on line 24 causes a parse error — CI is failing.

The from app.models.llm.request import (...) block closes correctly on line 22. Line 23 adds a standalone call_guardrails import, then line 24 has an orphaned ) that makes the file unparseable (confirmed by the CI failure: Cannot parse file: 24:0: )).

Additionally, call_guardrails (line 23) is stale: the dead pre-session block references it (line 175), while the active guardrails path uses run_guardrails_validation (already imported via lines 26–29).

🐛 Proposed fix

TextInput, AudioInput, ) -from app.services.llm.guardrails import call_guardrails -) from app.models.llm.response import TextOutput

🧰 Tools

🪛 GitHub Actions: Kaapi CI

[error] 24-24: Black formatting failed. Cannot parse file: 24:0: ).

🪛 Ruff (0.15.1)

[warning] 24-24: Expected a statement

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 22 - 24, Remove the stray closing parenthesis and the stale import: delete the lone ")" on line 24 and replace the import of call_guardrails with the correct active guardrails symbol; remove or change the import statement referencing call_guardrails and ensure the file only imports run_guardrails_validation (the function actually used in the active guardrails path) so that the dead pre-session reference to call_guardrails is eliminated and the module parses cleanly.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+        if input_guardrails:
+            if not isinstance(request.query.input, TextInput):
+                logger.warning(
+                    "[execute_job] Input guardrails skipped for non-text input type"
+                )
+            else:
+                safe_input = call_guardrails(input_query, input_guardrails, job_id)
+
+            logger.info(
+                f"[execute_job] Input guardrail validation | success={safe_input['success']}."
+            )
+
+            if safe_input.get("bypassed"):
+                logger.info("[execute_job] Guardrails bypassed (service unavailable)")
+
+            elif safe_input["success"]:
+                # Update the text value within the QueryInput structure
+                request.query.input.content.value = safe_input["data"]["safe_text"]
+
+                if safe_input["data"]["rephrase_needed"]:
+                    callback_response = APIResponse.failure_response(
+                        error=safe_input["data"]["safe_text"],
+                        metadata=request.request_metadata,
+                    )
+                    return handle_job_error(
+                        job_id, request.callback_url, callback_response
+                    )
+            else:
+                # Update the text value with error message
+                request.query.input.content.value = safe_input["error"]
+
+                callback_response = APIResponse.failure_response(
+                    error=safe_input["error"],
+                    metadata=request.request_metadata,
+                )
+                return handle_job_error(job_id, request.callback_url, callback_response)


⚠️ Potential issue | 🟠 Major

Dead pre-session guardrails block — unreachable and references undefined input_query.

At line 169, input_guardrails is still [] (initialized at line 160); the actual population via list_validators_config happens at lines 253–259 inside the session. The if input_guardrails: check is therefore always False and this entire block is unreachable.

Beyond being dead code, it has two additional defects that would surface if it ever ran:

Line 175: call_guardrails(input_query, ...) — input_query is never defined in this scope (NameError).

Lines 177–178: safe_input['success'] is accessed outside the TextInput branch, so safe_input would be undefined for non-text inputs.

The actual working implementation is at lines 261–300. Remove lines 169–204 entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 169 - 204, Remove the unreachable pre-session guardrails block that begins with the "if input_guardrails:" check (the code that calls call_guardrails(...) and manipulates safe_input/request.query.input) because input_guardrails is only populated later in the session; instead rely on the existing working guardrails implementation already present later (the session-based block that uses list_validators_config and call_guardrails). Delete the entire pre-session block to eliminate dead code and the undefined symbol input_query and ensure only the session-based guardrail handling is executed.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+            if resolved_input and isinstance(resolve_input, AudioInput):
                cleanup_temp_file(resolved_input)


⚠️ Potential issue | 🔴 Critical

Temp-file cleanup never executes — resolve_input is the imported function, not the input object.

resolve_input on line 411 refers to the resolve_input function imported at line 32, not the resolved string resolved_input or the request's input object. isinstance(<function>, AudioInput) is always False, so cleanup_temp_file is never called and every audio request leaks a temp file. The past review fix targeted this exact line but the wrong variable was used.

🐛 Proposed fix

- if resolved_input and isinstance(resolve_input, AudioInput): + if resolved_input and isinstance(request.query.input, AudioInput): cleanup_temp_file(resolved_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 411 - 412, The isinstance check is using the imported function name resolve_input instead of the resolved value; replace the check to test the resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned up. Locate the block around the resolve_input usage in jobs.py where resolved_input is set and update the conditional to reference resolved_input and pass that object to cleanup_temp_file.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+                else:
+                    response.response.output.content.value = safe_output["error"]
+
+                    callback_response = APIResponse.failure_response(
+                        error=safe_output["error"],
+                        metadata=request.request_metadata,
+                    )
+                    return handle_job_error(
+                        job_id, request.callback_url, callback_response
+                    )


⚠️ Potential issue | 🔴 Critical

Orphaned else: block is invalid Python syntax and references undefined safe_output.

Line 466 introduces a second else: at the same indentation (8 spaces) as line 421's else:, which already closes the if not isinstance(response.response.output, TextOutput): conditional. Python allows only one else per conditional, making this a syntax error confirmed by Ruff (lines 466–467: "Expected a statement").

Additionally, if this block somehow executed (e.g., in the not isinstance branch), safe_output (set at line 423 inside the else at line 421) would be undefined, causing a NameError. This block is a copy-paste artifact of the inner else: at line 455 and should be removed entirely.

🐛 Proposed fix — remove the orphaned block

else: response.response.output.content.value = safe_output["error"] callback_response = APIResponse.failure_response( error=safe_output["error"], metadata=request.request_metadata, ) return handle_job_error( job_id, request.callback_url, callback_response ) - else: - response.response.output.content.value = safe_output["error"] - - callback_response = APIResponse.failure_response( - error=safe_output["error"], - metadata=request.request_metadata, - ) - return handle_job_error( - job_id, request.callback_url, callback_response - ) callback_response = APIResponse.success_response(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

response.response.output.content.value = safe_output["error"]

callback_response = APIResponse.failure_response(

error=safe_output["error"],

metadata=request.request_metadata,

)

return handle_job_error(

job_id, request.callback_url, callback_response

)

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 466-466: Expected a statement

(invalid-syntax)

[warning] 466-466: Expected a statement

(invalid-syntax)

[warning] 466-467: Expected a statement

(invalid-syntax)

[warning] 467-467: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 466 - 475, Remove the orphaned redundant else block that assigns response.response.output.content.value = safe_output["error"] and constructs callback_response using APIResponse.failure_response, then calls handle_job_error; this block references undefined safe_output and duplicates logic already handled in the inner else, so delete that entire orphaned else section (the duplicated assignment and return calling handle_job_error) to restore valid Python syntax and avoid NameError while keeping the existing error handling in the inner else branch intact.

…-tts' into feature/unified-api-tts

Prajna1999 and others added 28 commits January 20, 2026 21:57

fix: merge conflict

5aad042

chore: update dependencies

f47c20c

feat: add google ai provider for Gemini models

1e03961

feat: working stt with gemini and hotfixing circular import

4ac4de8

Merge branch 'main' into feature/unified-api-stt-new

3c0bae7

Merge branch 'main' into feature/unified-api-stt-new

7db94f1

feat: llm_call table, type enforce gAI stt response

196eb5c

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

dca3139

…-stt-new' into feature/unified-api-stt-new merge upstream

feat: discriminated union type enforcing for stt, tts and text comple…

271d677

…tion config

Merge branch 'main' into feature/unified-api-stt-new

5ae59e5

fix: type annotation

250ce9f

chore: fix alembic revision for shure

1742a8b

feat: add google stt task to async job

ebb2394

feat: yolo commit and linting issues

0bcb697

feat: query input takes audio_url and base64 as audio file input

a6850a3

chore: test cases for google ai and async job fixes, supress mappers …

f4693f6

…test for now

fix: test cases for config

909e249

chore: clean PLAN.md

a7b0062

chore: extract stt code into its own

fa25199

Merge branch 'main' into feature/unified-api-stt-new

24007a2

Refactor evaluation endpoint to use stored configuration and remove a…

bbd2c7f

…ssistant_id handling

fix: default original provider bug

b907440

fix: coderrabbit comments

9f38f45

Merge branch 'main' into feature/unified-api-stt-new

f6348b5

fix: migration number

b3ea8ec

chore: formatting issue solved

26e0a6a

fix: eval core crud test cases

a623efa

fix: test cases for evaluation and test_llm

5c86cf2

Prajna1999 requested review from nishika26 and vprashrex February 2, 2026 15:59

Prajna1999 added 8 commits February 16, 2026 09:08

feat: basic tts implementation with gemini-2.5-pro-preview-tts

75cccf6

refactor: use pydub for wav to ogg, mp3 conversion

243e6f1

feat: add tts config fields to mappers function

cc3fa95

chore: fix test cases

f54c1a9

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

c044e66

…-tts' into feature/unified-api-tts

Merge branch 'main' into feature/unified-api-tts

065ac74

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

1d953e6

…-tts' into feature/unified-api-tts

fix: comments

1f93b0e

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

vprashrex approved these changes Feb 18, 2026

View reviewed changes

nishika26 requested changes Feb 18, 2026

View reviewed changes

Merge branch 'main' into feature/unified-api-tts

afe349f

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Prajna1999 moved this to In Progress in Kaapi-dev Feb 19, 2026

Prajna1999 added this to Kaapi-dev Feb 19, 2026

Prajna1999 moved this from In Progress to In Review in Kaapi-dev Feb 19, 2026

Prajna1999 added 8 commits February 19, 2026 11:29

chore: remove unsued imports

2f1e32d

reafactor: refactor execute_job function and resolved other comments

6553e78

fix: test_job test case

9fffc81

Merge branch 'main' into feature/unified-api-tts

3a2d625

chore: test cases coverage and cleanups

1999cbf

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

3e0f069

…-tts' into feature/unified-api-tts

fix_name error

8d9ae7d

fix: test cases

6245e8e

nishika26 approved these changes Feb 23, 2026

View reviewed changes

chore: add install ffmpeg step to the CI yml file

2cba551

Prajna1999 merged commit 8c77e17 into main Feb 23, 2026
2 checks passed

Prajna1999 deleted the feature/unified-api-tts branch February 23, 2026 17:06

github-project-automation bot moved this from In Review to Closed in Kaapi-dev Feb 23, 2026

Prajna1999 linked an issue Feb 23, 2026 that may be closed by this pull request

CI: Add ffmpeg as a dependency before running tests on CI #615

Closed

	config = types.GenerateContentConfig(**config_kwargs)
	config = GenerateContentConfig(**config_kwargs)

		return load_description(Path(filename))


		# Input resolver functions moved from app.services.llm.input_resolver

		if resolved_input and isinstance(resolve_input, AudioInput):
		cleanup_temp_file(resolved_input)

Comments

Conversation

Prajna1999 commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

nishika26 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

nishika26 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Prajna1999 commented Feb 2, 2026 •

edited by coderabbitai bot

Loading