feat: (lark-doc) Doc Search advanced boolean and intitle search syntax#210
feat: (lark-doc) Doc Search advanced boolean and intitle search syntax#210MakeLarkGreatAgain wants to merge 1 commit intomainfrom
Conversation
…or AI agents Change-Id: I647ffad4579c503711a7ea220c390dca760cd6de
📝 WalkthroughWalkthroughDocumentation update to search query functionality clarifying advanced Boolean syntax support ( Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR updates the AI agent skill documentation for Key concerns:
Confidence Score: 2/5Not safe to merge as-is — the documentation makes unverified API claims and removes a correct client-side exact-match guard, which could cause agents to silently return incorrect results. Two P1 issues: (1) the advanced Boolean and skills/lark-doc/references/lark-doc-search.md — both the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User Query Intent] --> B{Intent Type?}
B -->|"Keyword in title (intitle:)"| C["Build: --query intitle:X"]
B -->|"Exact title equals X"| D["Build: --query intitle:\"X\""]
B -->|"Boolean / exclusion"| E["Build: --query 'A OR B' / 'A -B'"]
B -->|"Phrase match"| F["Build: --query \"exact phrase\""]
C --> G[POST /search/v2/doc_wiki/search]
D --> G
E --> G
F --> G
G --> H{API honours advanced syntax?}
H -->|"✅ Confirmed (unverified in PR)"| I[Server-side filtered results]
H -->|"❌ Operators ignored"| J[Unfiltered / unexpected results]
I --> K{Exact title equality needed?}
K -->|"No — keyword-in-title sufficient"| L[Return results to agent]
K -->|"Yes — intitle: only contains, not equals"| M["⚠️ Client-side exact-match filter\n(removed by new decision rule)"]
M --> L
J --> N["Agent misinterprets results\n(token explosion / hallucination)"]
Reviews (1): Last reviewed commit: "docs(lark-doc): document advanced boolea..." | Re-trigger Greptile |
| ## 决策规则 | ||
|
|
||
| - 查询语义:默认按关键词搜索理解。用户说“标题为 `X`”“标题里有 `X`”“搜索 `X` 文档”时,先直接返回命中的 OpenAPI 结果;只有用户明确要求“标题精确等于 `X`”时,才做客户端二次筛选。做精确匹配前,先去掉 `title_highlighted` 里的高亮标签。 | ||
| - 查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求“标题精确等于 X”时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。 |
There was a problem hiding this comment.
intitle: ≠ exact title match — logic regression
The new decision rule mandates --query "intitle:\"X\"" for the case where the user says "标题精确等于 X" (exact title match) and explicitly prohibits client-side re-filtering. However, intitle:X is a contains predicate, not an equals predicate: it returns every document whose title contains X as a substring or keyword. A document titled "2026 规划汇总补充" would still match intitle:"2026规划".
The previous rule correctly acknowledged this gap and allowed client-side filtering (after stripping <h>/<hb> tags from title_highlighted) specifically for exact-equality scenarios. Removing that safety valve without a true server-side exact-title-equals operator means agents will silently over-report matches and the "total count hallucination" problem the PR aims to solve is not actually fixed — it is just relocated to a different type of false positive.
Suggested approach: keep the explicit client-side guard for exact-title-equals while still encouraging intitle: for the broader "keyword in title" intent:
- 查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。
当用户要求"标题包含 X"时,使用 intitle:X。
当用户要求"标题精确等于 X"时,使用 intitle:"X" 缩小候选集,但仍需在客户端对
title_highlighted(去掉高亮标签后)做等值比对,因为 intitle: 是包含语义而非等值语义。
严禁在 intitle: 已能覆盖的"标题含关键词"场景中再做客户端模糊二次筛选。
| | 参数 | 必填 | 说明 | | ||
| |------|------|------| | ||
| | `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 | | ||
| | `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 | |
There was a problem hiding this comment.
Advanced Boolean syntax not confirmed supported by the underlying API
The updated description asserts that the Lark Search v2 endpoint (POST /open-apis/search/v2/doc_wiki/search) supports AND (space), OR, - exclusion, "" phrase match, and intitle: as first-class query operators. However, neither the PR description nor any referenced documentation confirms that the upstream OpenAPI actually honours these operators.
If the API treats the query as a plain keyword string, an agent following these instructions would construct queries like "飞书 -纪要" or "intitle:\"2026规划\"" and receive unexpected results — or silently get results that ignore the operators entirely, producing the same token-explosion and hallucination problems the PR aims to prevent.
Before documenting these as supported features, please verify against the official Feishu/Lark OpenAPI documentation (or an actual API call) that each operator is genuinely honoured server-side, and ideally link to the relevant section of the API reference. If only some operators are confirmed, document only those.
| | 参数 | 必填 | 说明 | | ||
| |------|------|------| | ||
| | `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 | | ||
| | `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 | |
There was a problem hiding this comment.
"" example for phrase match looks like an empty string
Item 4 in the new param description reads:
- 使用双引号
""表示精确匹配短语。
Displaying "" in a code span reads as an empty double-quoted string. This is likely to confuse an AI agent (or a human reader) who expects a concrete usage example. Consider replacing "" with a real-world example consistent with the surrounding examples:
| | `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 | | |
| | `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号表示精确匹配短语(如 `"季度总结"`)。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 | |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
skills/lark-doc/references/lark-doc-search.md (1)
62-62: Consider adding concrete examples for the client-side filtering exception.The decision rule states: "只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤" (only allow client-side filtering when server syntax cannot cover complex local matching scenarios).
This exception clause provides necessary flexibility but lacks concrete examples. Without specific scenarios, different developers or the AI Agent might interpret "complex local matching scenarios" inconsistently, potentially leading to unnecessary client-side filtering when server-side syntax would suffice.
📝 Suggested clarification with examples
Consider appending specific examples after the exception clause, such as:
-查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求"标题精确等于 X"时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。 +查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求"标题精确等于 X"时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时(例如:多字段联合条件、正则表达式匹配、自定义评分排序),才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/lark-doc/references/lark-doc-search.md` at line 62, Add concrete examples illustrating the allowed "client-side filtering" exception right after the sentence that permits client filtering when server syntax cannot cover complex local matching; specifically, append 2–3 short scenarios (e.g., fuzzy unicode normalization across diacritics, language-specific tokenization differences, or multi-field proximity matches that the server lacks) and show the minimal client-side steps (strip HTML tags from title_highlighted, then apply the local comparison) and when to prefer --query with intitle:"X" instead; reference the existing terms --query, intitle:, and title_highlighted so readers can locate the rule and understand exact inputs and the required pre-filtering step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/lark-doc/references/lark-doc-search.md`:
- Line 49: Update the `--query <text>` documentation to remove the unsupported
Boolean/operator examples and state that the Feishu Search v2 endpoint
`/open-apis/search/v2/doc_wiki/search` accepts only a simple text string (basic
keyword matching, max ~50 characters) — i.e., replace the current list of
Boolean operators (AND via spaces, OR, -, "", intitle:) with a concise note
about plain keyword matching; also note that `shortcuts/doc/docs_search.go`
passes the query directly to the API so no client-side parsing/validation is
performed.
---
Nitpick comments:
In `@skills/lark-doc/references/lark-doc-search.md`:
- Line 62: Add concrete examples illustrating the allowed "client-side
filtering" exception right after the sentence that permits client filtering when
server syntax cannot cover complex local matching; specifically, append 2–3
short scenarios (e.g., fuzzy unicode normalization across diacritics,
language-specific tokenization differences, or multi-field proximity matches
that the server lacks) and show the minimal client-side steps (strip HTML tags
from title_highlighted, then apply the local comparison) and when to prefer
--query with intitle:"X" instead; reference the existing terms --query,
intitle:, and title_highlighted so readers can locate the rule and understand
exact inputs and the required pre-filtering step.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b910cee2-9ac6-47c0-9b7a-a73187c77dc7
📒 Files selected for processing (1)
skills/lark-doc/references/lark-doc-search.md
| | 参数 | 必填 | 说明 | | ||
| |------|------|------| | ||
| | `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 | | ||
| | `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 | |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does the Feishu (Lark) Search v2 API endpoint /open-apis/search/v2/doc_wiki/search support Boolean operators like OR, negation with hyphen (-), exact phrase matching with double quotes, and intitle: syntax in the query parameter?
💡 Result:
No, the Feishu (Lark) Search v2 API endpoint /open-apis/search/v2/doc_wiki/search does not support Boolean operators like OR, negation with hyphen (-), exact phrase matching with double quotes, or intitle: syntax in the query parameter. The query is a simple string field limited to 50 characters, used for keyword matching with filters like doc_filter and wiki_filter. Official documentation describes basic full-match, prefix-match, and word-segmentation matching for Chinese/English in related open search features, but no advanced query syntax for this endpoint. UI advanced search guides exist separately but do not apply to the API.
Citations:
- 1: https://open.larksuite.com/document/uAjLw4CM/ukTMukTMukTM/search-v2/doc_wiki/search
- 2: https://open.feishu.cn/document/uAjLw4CM/ukTMukTMukTM/search-v2/doc_wiki/search
- 3: https://open.feishu.cn/document/uAjLw4CM/ukTMukTMukTM/search-v2/common-template-intergration-handbook
- 4: https://open.larkoffice.com/document/server-docs/search-v2/suite-search/create
- 5: https://www.searchium.ai
Remove unsupported Boolean operators from the --query parameter documentation.
The Feishu Search v2 API endpoint /open-apis/search/v2/doc_wiki/search does not support the documented Boolean operators (AND via spaces, OR, -, "", intitle:). According to the official API specification, the query parameter accepts only a simple text string (limited to 50 characters) for basic keyword matching.
The current documentation will mislead users into attempting search syntax that the API does not support, resulting in unexpected or broken search results. Since the code in shortcuts/doc/docs_search.go passes queries directly to the API without validation, there is no client-side fallback to handle invalid syntax.
Update line 49 to accurately describe the API's actual search capabilities—likely basic keyword matching without advanced Boolean syntax support.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/lark-doc/references/lark-doc-search.md` at line 49, Update the
`--query <text>` documentation to remove the unsupported Boolean/operator
examples and state that the Feishu Search v2 endpoint
`/open-apis/search/v2/doc_wiki/search` accepts only a simple text string (basic
keyword matching, max ~50 characters) — i.e., replace the current list of
Boolean operators (AND via spaces, OR, -, "", intitle:) with a concise note
about plain keyword matching; also note that `shortcuts/doc/docs_search.go`
passes the query directly to the API so no client-side parsing/validation is
performed.
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@80c1917a6d8432c947ebe1bbe8c098405890e6e4🧩 Skill updatenpx skills add larksuite/cli#docs/enhance-doc-search-syntax -y -g |
|
|
Summary
目前 lark-cli docs +search 仅透出了基础关键词检索能力,AI Agent 无法感知底层搜索引擎支持的高级逻辑语法。这导致在处理“精确标题匹配”或“排除特定内容”等复杂意图时,Agent 只能拉回大量冗余数据在本地进行二次过滤,极易产生 Token 爆炸和总数统计幻觉。本 PR 通过更新 Skill 描述文件,正式向 AI Agent 暴露高级 Boolean 和 intitle: 语法。
Changes
Test Plan
Summary by CodeRabbit
--queryparameter documentation to detail advanced Boolean search syntax support, including AND operators (via spaces), OR, term exclusion (-), exact phrase matching (double quotes), and title-specific filtering withintitle:.