Skip to content

fix: preserve OpenAI TTS response format on disk#6090

Open
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-6015-openai-tts-ext
Open

fix: preserve OpenAI TTS response format on disk#6090
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-6015-openai-tts-ext

Conversation

@stablegenius49
Copy link

@stablegenius49 stablegenius49 commented Mar 12, 2026

Fixes #6015

Modifications / 改动点

  • stop hardcoding the streamed OpenAI TTS response to a .wav temp file

  • detect the real audio format from content-type or the first bytes and save with the matching extension

  • raise a clear runtime error when the TTS endpoint returns JSON / HTML / other non-audio payloads instead of letting the DingTalk ffmpeg conversion fail later with an opaque invalid-input error

  • add focused regression tests for mp3 content-type preservation and non-audio payload handling

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Verification Steps:

python3.11 -m pytest tests/test_openai_tts_api_source.py -q

Result:

2 passed in 0.91s
python3.11 -m ruff check astrbot/core/provider/sources/openai_tts_api_source.py tests/test_openai_tts_api_source.py

Result:

All checks passed!

Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

通过在磁盘上保留 OpenAI TTS 流式音频响应的实际格式,并在遇到非音频负载时快速失败,使处理过程更加健壮。

Bug Fixes:

  • 为 OpenAI TTS 响应保留真实的音频文件扩展名,而不是总是写入 .wav 文件。
  • 当 TTS 端点返回非音频负载(例如 JSON/HTML)或空响应时,显式抛出运行时错误,而不是让后续处理以不透明的方式失败。

Tests:

  • 添加回归测试,用于验证 mp3 content-type 的保留,以及对非音频 TTS 响应进行清晰的错误处理。
Original summary in English

Summary by Sourcery

Handle OpenAI TTS streamed audio responses more robustly by preserving their actual format on disk and failing fast on non-audio payloads.

Bug Fixes:

  • Preserve the true audio file extension for OpenAI TTS responses instead of always writing a .wav file.
  • Raise explicit runtime errors when the TTS endpoint returns non-audio payloads (e.g., JSON/HTML) or empty responses instead of letting later processing fail opaquely.

Tests:

  • Add regression tests to verify mp3 content-type preservation and clear error handling for non-audio TTS responses.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@auto-assign auto-assign bot requested review from Soulter and advent259141 March 12, 2026 02:51
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Mar 12, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我在这里给出了一些高层次的反馈:

  • 当前 get_audio 的新实现会在写入磁盘之前,把整个流式响应先缓存在内存中,这在处理大型 TTS 输出时可能会有问题;建议只缓存一小段前缀用于格式检测,其余的数据直接流式写入文件。
  • _resolve_audio_extension 遇到一个不在 extension_map 中的 audio/* content-type,且通过头部嗅探也无法匹配到已知格式时,会静默地回退到 .wav;更安全的做法可能是抛出错误,或者至少在错误信息中包含原始的 content_type,以避免对未知音频格式进行错误标记。
AI 代理提示词
Please address the comments from this code review:

## Overall Comments
- The new implementation of `get_audio` buffers the entire streamed response into memory before writing to disk, which may be problematic for large TTS outputs; consider only buffering a small prefix for format detection and streaming the rest directly to the file.
- When `_resolve_audio_extension` encounters an `audio/*` content-type that is not in `extension_map` and whose header sniffing does not match a known format, it silently falls back to `.wav`; it might be safer to raise or at least include the original `content_type` in an error to avoid mislabeling unknown audio formats.

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码评审。
Original comment in English

Hey - I've left some high level feedback:

  • The new implementation of get_audio buffers the entire streamed response into memory before writing to disk, which may be problematic for large TTS outputs; consider only buffering a small prefix for format detection and streaming the rest directly to the file.
  • When _resolve_audio_extension encounters an audio/* content-type that is not in extension_map and whose header sniffing does not match a known format, it silently falls back to .wav; it might be safer to raise or at least include the original content_type in an error to avoid mislabeling unknown audio formats.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new implementation of `get_audio` buffers the entire streamed response into memory before writing to disk, which may be problematic for large TTS outputs; consider only buffering a small prefix for format detection and streaming the rest directly to the file.
- When `_resolve_audio_extension` encounters an `audio/*` content-type that is not in `extension_map` and whose header sniffing does not match a known format, it silently falls back to `.wav`; it might be safer to raise or at least include the original `content_type` in an error to avoid mislabeling unknown audio formats.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]已经安装了ffmpeg,但是使用TTS模型发送语音时还是提示ffmpeg not found,重启适配器和重启astrbot无效

1 participant