Fix/cron agent tool call not triggered#6112
Fix/cron agent tool call not triggered#6112NayukiChiba wants to merge 1 commit intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在解决定时任务代理在触发后无法有效向用户发送消息的关键问题。通过精确调整代理的系统提示和运行时指令,明确强制其使用 Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
此拉取请求有效地解决了一个关键问题,即由 cron 触发的代理未能使用必要的工具发送消息,从而导致静默失败。所做的更改是全面的:
- 提示词工程:针对 cron 代理的系统提示词和用户提示词都得到了显著加强,明确且强制性地指示 LLM 使用
send_message_to_user工具,并阐明了直接的文本输出对用户不可见。 - 历史记录管理:一项关键的逻辑变更现在可以防止在没有调用工具的失败运行污染对话历史。只有在实际使用了工具的情况下,历史记录才会被持久化,这打破了负反馈循环,并提高了代理未来的性能。
- 错误处理:通过优雅地处理从不支持模型列表端点的提供商处列出模型时的 API 错误,
anthropic_source的稳健性得到了增强。 - 测试:增加了广泛的单元测试来验证新行为,包括提示词构建、有条件的历史持久化和日志记录,确保了修复的质量和正确性。
实现清晰,逻辑严谨,并且修复措施精确地针对已识别的根本原因。在审查过程中未发现任何问题。
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并给出了一些高层面的反馈:
_woke_main_agent中的tool_was_called = any(msg.role == "tool" for msg in runner.run_context.messages)检查假设工具总是以"tool"角色的消息形式呈现;如果 runner 或 provider 使用不同方式表示工具调用(例如通过tool_calls字段或其他角色),这个启发式可能会误判运行结果并跳过历史持久化,因此建议改为基于 runner 的结构化工具调用元数据来判断。- 若干新增测试对提示词和日志消息中的特定英文子串(
"MUST"、"NOT visible"、warning 文本)做断言,这会使它们对无伤大雅的文案修改非常脆弱;更稳健的方式是针对结构性质做断言(例如在func_tool.tools中存在send_message_to_user、布尔标志,或关键短语组),而不是对精确措辞做断言。
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- `_woke_main_agent` 中的 `tool_was_called = any(msg.role == "tool" for msg in runner.run_context.messages)` 检查假设工具总是以 `"tool"` 角色的消息形式呈现;如果 runner 或 provider 使用不同方式表示工具调用(例如通过 `tool_calls` 字段或其他角色),这个启发式可能会误判运行结果并跳过历史持久化,因此建议改为基于 runner 的结构化工具调用元数据来判断。
- 若干新增测试对提示词和日志消息中的特定英文子串(`"MUST"`、`"NOT visible"`、warning 文本)做断言,这会使它们对无伤大雅的文案修改非常脆弱;更稳健的方式是针对结构性质做断言(例如在 `func_tool.tools` 中存在 `send_message_to_user`、布尔标志,或关键短语组),而不是对精确措辞做断言。
## Individual Comments
### Comment 1
<location path="tests/unit/test_cron_manager.py" line_range="739-741" />
<code_context>
+ extras={"cron_job": {"id": "j4", "name": "test"}, "cron_payload": {}},
+ )
+
+ # 必须有相关警告
+ warning_calls = [str(c) for c in mock_logger.warning.call_args_list]
+ assert any("tool" in w.lower() or "NOT" in w for w in warning_calls), (
+ "未找到关于工具未被调用的 warning 日志"
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** 收紧 logger warning 的断言,使测试对无关日志调用更加稳健
当前断言可能会因为不相关的 warning 恰好包含 “tool” 或 “NOT” 而通过,即便关键的 warning 消息已经退化。更好的方式是对该场景发出的特定 warning 进行断言,例如:
```python
mock_logger.warning.assert_called_once()
msg, *rest = mock_logger.warning.call_args[0]
assert "did not call any tools" in msg
assert "NOT delivered" in msg
```
这样可以让测试更精确,并且对无关日志变更更具韧性。
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进评审质量。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- The
tool_was_called = any(msg.role == "tool" for msg in runner.run_context.messages)check in_woke_main_agentassumes tools are always surfaced as messages with role"tool"; if the runner or provider represents tool calls differently (e.g., viatool_callsfields or other roles), this heuristic may misclassify runs and skip history persistence, so consider basing this on the runner's structured tool-call metadata instead. - Several new tests assert against specific English substrings in prompts and log messages (
"MUST","NOT visible", warning text), which makes them brittle to harmless copy edits; it would be more robust to assert on structural properties (e.g., presence ofsend_message_to_userinfunc_tool.tools, boolean flags, or keyphrase groups) rather than exact phrasing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `tool_was_called = any(msg.role == "tool" for msg in runner.run_context.messages)` check in `_woke_main_agent` assumes tools are always surfaced as messages with role `"tool"`; if the runner or provider represents tool calls differently (e.g., via `tool_calls` fields or other roles), this heuristic may misclassify runs and skip history persistence, so consider basing this on the runner's structured tool-call metadata instead.
- Several new tests assert against specific English substrings in prompts and log messages (`"MUST"`, `"NOT visible"`, warning text), which makes them brittle to harmless copy edits; it would be more robust to assert on structural properties (e.g., presence of `send_message_to_user` in `func_tool.tools`, boolean flags, or keyphrase groups) rather than exact phrasing.
## Individual Comments
### Comment 1
<location path="tests/unit/test_cron_manager.py" line_range="739-741" />
<code_context>
+ extras={"cron_job": {"id": "j4", "name": "test"}, "cron_payload": {}},
+ )
+
+ # 必须有相关警告
+ warning_calls = [str(c) for c in mock_logger.warning.call_args_list]
+ assert any("tool" in w.lower() or "NOT" in w for w in warning_calls), (
+ "未找到关于工具未被调用的 warning 日志"
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the logger warning assertion to make the test more robust against unrelated log calls
The current assertion could pass due to unrelated warnings that happen to contain “tool” or “NOT”, and would still pass if the key warning message were degraded. Instead, assert on the specific warning emitted by this scenario, e.g.:
```python
mock_logger.warning.assert_called_once()
msg, *rest = mock_logger.warning.call_args[0]
assert "did not call any tools" in msg
assert "NOT delivered" in msg
```
This makes the test more precise and resilient to unrelated log changes.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # 必须有相关警告 | ||
| warning_calls = [str(c) for c in mock_logger.warning.call_args_list] | ||
| assert any("tool" in w.lower() or "NOT" in w for w in warning_calls), ( |
There was a problem hiding this comment.
suggestion (testing): 收紧 logger warning 的断言,使测试对无关日志调用更加稳健
当前断言可能会因为不相关的 warning 恰好包含 “tool” 或 “NOT” 而通过,即便关键的 warning 消息已经退化。更好的方式是对该场景发出的特定 warning 进行断言,例如:
mock_logger.warning.assert_called_once()
msg, *rest = mock_logger.warning.call_args[0]
assert "did not call any tools" in msg
assert "NOT delivered" in msg这样可以让测试更精确,并且对无关日志变更更具韧性。
Original comment in English
suggestion (testing): Tighten the logger warning assertion to make the test more robust against unrelated log calls
The current assertion could pass due to unrelated warnings that happen to contain “tool” or “NOT”, and would still pass if the key warning message were degraded. Instead, assert on the specific warning emitted by this scenario, e.g.:
mock_logger.warning.assert_called_once()
msg, *rest = mock_logger.warning.call_args[0]
assert "did not call any tools" in msg
assert "NOT delivered" in msgThis makes the test more precise and resilient to unrelated log changes.
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: AstrBotTeam's Space pr4697的改动View Suggested Changes@@ -327,6 +327,20 @@
- 任务管理:通过工具或 Web UI 创建、修改、删除定时任务。主代理可为每个任务添加描述,便于后续管理。
- 触发机制:到达设定时间后,CronJobManager 调用 MainAgent 执行任务循环,自动完成预设操作。
+**定时任务工具调用修复(PR #6112)**
+
+[PR #6112](https://github.com/AstrBotDevs/AstrBot/pull/6112) 修复了定时代理不调用 `send_message_to_user` 工具的关键问题([Issue #6103](https://github.com/AstrBotDevs/AstrBot/issues/6103))。修复前,定时任务触发后,Agent 会生成纯文本输出(`finish_reason: stop`),但消息从未实际发送给用户,导致用户无法收到定时推送。
+
+**根本原因:**
+1. **误导性提示词**:`req.prompt` 末尾要求 LLM「总结并输出执行结果」,导致 LLM 直接生成文本即认为任务完成,不调用工具
+2. **系统提示语义模糊**:`PROACTIVE_AGENT_CRON_WOKE_SYSTEM_PROMPT` 规则 5 写的是「if needed」,LLM 不知道文本对用户完全不可见;此外,存在换行符缺失导致格式混乱
+3. **历史污染反馈循环**:失败执行(未调用工具)也写入对话历史,LLM 从历史中学到「文本输出 = 完成」,形成恶性循环
+
+**修复方案:**
+- **req.prompt 增强**:明确说明「你的文本响应对用户不可见——只有工具调用才会生效。你必须调用 `send_message_to_user` 工具才能将消息传递给用户。」
+- **系统提示强化**:规则 5 现在强制要求:「重要:你的文本输出对用户不可见。向用户传递消息的唯一方式是调用 `send_message_to_user` 工具。你必须调用此工具发送任何消息——不要只生成文本。」同时修复了换行符缺失问题。
+- **历史持久化逻辑改进**:现在仅在 Agent 实际调用了工具时(`tool_was_called` 检查)才将对话历史持久化,防止失败执行污染上下文,打破负反馈循环。
+
#### 使用场景示例
用户可通过对话指令或 UI 创建定时任务,例如“每天早上8点发送今日早报”。主代理会注册一个 ActiveAgentCronJob,定时触发后自动执行相关脚本并通过 send_message_to_user 工具推送内容。
@@ -344,6 +358,39 @@
- 支持多平台:Telegram、Slack、Discord 等
- 支持多媒体类型:文本、图片、语音消息(record)、视频、文件等
+
+#### 提示词强制要求(PR #6112)
+在定时任务场景中,系统通过以下机制确保 Agent 正确调用 `send_message_to_user` 工具:
+
+**用户提示(req.prompt):**
+```
+A scheduled task has been triggered.
+Proceed according to your system instructions.
+You MUST call the `send_message_to_user` tool to deliver any message to the user.
+Your direct text response is NOT visible to the user — only tool calls take effect.
+Use the same language as the previous conversation.
+```
+
+**系统提示(PROACTIVE_AGENT_CRON_WOKE_SYSTEM_PROMPT 规则 5):**
+```
+IMPORTANT: Your text output is NOT visible to the user. The ONLY way to deliver a message
+to the user is by calling the `send_message_to_user` tool. You MUST call this tool to send
+any message — do NOT just generate text.
+```
+
+这些提示词明确告知 LLM:文本响应对用户完全不可见,必须通过工具调用才能传递消息。
+
+#### 历史持久化逻辑
+为防止失败执行污染上下文,定时任务的对话历史持久化逻辑已改进(`_woke_main_agent` 方法):
+
+- **仅在工具被调用时持久化**:系统通过检查 `runner.run_context.messages` 中是否存在 `role == "tool"` 的消息来判断工具是否被调用
+- **跳过失败执行**:如果 Agent 完成执行但未调用任何工具,系统会记录警告日志并跳过历史持久化:
+ ```
+ Cron job agent did not call any tools.
+ The message was likely NOT delivered to the user.
+ Skipping history persistence to avoid misleading future context.
+ ```
+- **打破负反馈循环**:防止 LLM 从失败执行的历史中学到错误行为(即"生成文本即任务完成")
#### 使用示例
在定时任务触发后,主动调用 `send_message_to_user` 工具,将生成的日报、图片或其他内容直接发送给用户。Note: You must be authenticated to accept/decline updates. |
原因: - req.prompt 末尾要求 LLM 总结输出,导致 LLM 直接生成文本而不调用工具 - 系统提示规则模糊,未说明文本输出对用户不可见,存在换行符缺失 - 失败执行也写入对话历史,形成 LLM 误以为文本即完成的反馈循环 修复: - req.prompt 明确要求必须调用工具,说明直接文本不可见 - 更新 PROACTIVE_AGENT_CRON_WOKE_SYSTEM_PROMPT,强制工具调用,修复换行 - 仅在 tool_was_called 时写入历史,否则记录 warning 并跳过 closes AstrBotDevs#6103
d102685 to
40bd282
Compare
问题描述
修复 #6103
定时任务触发后,Agent 不调用
send_message_to_user工具,finish_reason为stop(纯文本输出),消息从未实际发送给用户。根本原因
req.prompt误导:末尾指令要求 LLM「总结并输出执行结果」,LLM 直接生成文本即认为任务完成,不调用工具。PROACTIVE_AGENT_CRON_WOKE_SYSTEM_PROMPT规则 5 写的是「if needed」,LLM 不知道文本对用户完全不可见;存在换行符缺失导致格式混乱。修复方案
req.prompt:明确说明文本不可见,必须调用工具才能传达消息给用户。PROACTIVE_AGENT_CRON_WOKE_SYSTEM_PROMPT:强制要求工具调用,修复换行符缺失。_woke_main_agent:仅在tool_was_called时写入历史,否则记录 warning 并跳过,打破反馈循环。Summary by Sourcery
确保由 cron 触发的代理通过工具调用可靠地向用户发送主动消息,并在未发生投递时避免污染会话历史。
Bug Fixes:
send_message_to_user工具,而不是发出不可见的纯文本响应,从而确保计划任务产生的消息实际能够发送给用户。/models端点的 Anthropic 兼容提供商的模型列出失败问题,改为返回空列表而不是抛出异常。Enhancements:
send_message_to_user工具是唯一的投递机制,并修复模板中的格式问题以提高可读性。Tests:
send_message_to_user等场景。Original summary in English
Summary by Sourcery
Ensure cron-triggered agents reliably deliver proactive messages via tool calls and avoid polluting history when no delivery occurred.
Bug Fixes:
send_message_to_usertool instead of emitting invisible plain-text responses so scheduled messages are actually delivered to users.Enhancements:
send_message_to_usertool is the only delivery mechanism, and fix formatting issues in the template for better readability.Tests:
send_message_to_userin the toolset.