Skip to content

feat: (lark-doc) Doc Search advanced boolean and intitle search syntax#210

Open
MakeLarkGreatAgain wants to merge 1 commit intomainfrom
docs/enhance-doc-search-syntax
Open

feat: (lark-doc) Doc Search advanced boolean and intitle search syntax#210
MakeLarkGreatAgain wants to merge 1 commit intomainfrom
docs/enhance-doc-search-syntax

Conversation

@MakeLarkGreatAgain
Copy link
Copy Markdown
Collaborator

@MakeLarkGreatAgain MakeLarkGreatAgain commented Apr 2, 2026

Summary

目前 lark-cli docs +search 仅透出了基础关键词检索能力,AI Agent 无法感知底层搜索引擎支持的高级逻辑语法。这导致在处理“精确标题匹配”或“排除特定内容”等复杂意图时,Agent 只能拉回大量冗余数据在本地进行二次过滤,极易产生 Token 爆炸和总数统计幻觉。本 PR 通过更新 Skill 描述文件,正式向 AI Agent 暴露高级 Boolean 和 intitle: 语法。

Changes

  • 参数描述增强:在 skills/lark-doc/references/lark-doc-search.md 中为 --query 参数补充了高级 Boolean 语法说明(包含 AND、OR、- 排除及 "" 精确匹配)。
  • 决策规则重构:更新了 ## 决策规则 中的查询语义逻辑,明确规定 Agent 必须优先使用 intitle: 语法将过滤逻辑下推给服务端,严禁在可用服务端过滤时进行本地二次比对。

Test Plan

  • 格式检查:确保 Markdown 语法及表格渲染正常。
  • 意图翻译测试(模拟 Agent 行为):
    • 场景 1(负向排除):输入“找飞书项目文档不要纪要”,Agent 成功生成 lark-cli docs +search --query "飞书 -纪要"。
    • 场景 2(逻辑或):输入“查找A团队介绍或B团队介绍的文档”,Agent 生成 lark-cli docs +search --query ""A团队介绍" OR "B团队介绍""。
    • 场景 3(标题精确匹配):输入“查找标题明确为 2026规划 的文档”,Agent 生成 lark-cli docs +search --query "intitle:"2026规划""。
  • [兼容性验证:手动在终端执行上述生成的命令,确认飞书 OpenAPI 返回结果符合预期且无报错。

Summary by CodeRabbit

  • Documentation
    • Enhanced --query parameter documentation to detail advanced Boolean search syntax support, including AND operators (via spaces), OR, term exclusion (-), exact phrase matching (double quotes), and title-specific filtering with intitle:.
    • Refined search query semantics to prioritize service-side advanced filtering over client-side fuzzy matching, with conditional client-side filtering only when service syntax is insufficient.

…or AI agents

Change-Id: I647ffad4579c503711a7ea220c390dca760cd6de
@MakeLarkGreatAgain MakeLarkGreatAgain added domain/doc Docs domain size/S Low-risk docs, CI, test, or chore only changes labels Apr 2, 2026
@github-actions github-actions bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact and removed domain/doc Docs domain labels Apr 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

Documentation update to search query functionality clarifying advanced Boolean syntax support (AND, OR, -, double quotes, intitle:) and refining decision rules for prioritizing service-side filtering with advanced query syntax over client-side filtering.

Changes

Cohort / File(s) Summary
Search Documentation
skills/lark-doc/references/lark-doc-search.md
Enhanced --query parameter documentation with explicit Boolean syntax examples and tightened query semantics rules to require service-side filtering via advanced syntax before fallback to client-side filtering with highlight tag stripping.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A search path refined with Boolean grace,
Service-side filters find their rightful place,
No fuzzy wandering, no tags left behind,
Documentation now crystal-clear in mind!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: exposing advanced Boolean and intitle search syntax in lark-doc documentation.
Description check ✅ Passed The description covers all required template sections: Summary explains motivation, Changes lists main updates, Test Plan details verification scenarios, and Related Issues is noted.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/enhance-doc-search-syntax

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot removed the size/S Low-risk docs, CI, test, or chore only changes label Apr 2, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR updates the AI agent skill documentation for lark-cli docs +search to expose advanced Boolean query operators (AND via space, OR, - exclusion, "" phrase match, intitle:) and restructures the decision rules to mandate server-side filtering over client-side post-processing.

Key concerns:

  • Unconfirmed API support (P1): The advanced Boolean operators and intitle: prefix are documented as guaranteed features of POST /open-apis/search/v2/doc_wiki/search, but no API reference or test evidence is cited to confirm that the Feishu/Lark backend actually honours these operators. If the API ignores them, agents will build queries that silently misbehave.
  • Semantic regression in decision rules (P1): The new rule uses intitle:"X" as a replacement for exact title equality matching and explicitly forbids client-side re-filtering. However, intitle: is a contains predicate, not an equals predicate. The previous rule's client-side exact-equality guard was correct for true exact-title scenarios; removing it creates a new category of false positives.
  • Misleading empty-string example (P2): Item 4 of the --query description illustrates phrase matching with `""`, which renders as an empty string rather than a useful example phrase.

Confidence Score: 2/5

Not safe to merge as-is — the documentation makes unverified API claims and removes a correct client-side exact-match guard, which could cause agents to silently return incorrect results.

Two P1 issues: (1) the advanced Boolean and intitle: operators are asserted as API-supported features with no citation or verification, and (2) the decision rule change conflates intitle: (contains) with exact title equality (equals) while explicitly prohibiting the client-side filtering that was previously the correct fallback. Together these introduce a new class of false positives while claiming to eliminate them.

skills/lark-doc/references/lark-doc-search.md — both the --query parameter description and the 决策规则 section need attention.

Important Files Changed

Filename Overview
skills/lark-doc/references/lark-doc-search.md Agent skill documentation updated to expose advanced Boolean and intitle: query syntax, but the API support for these operators is unconfirmed and the decision rule incorrectly equates intitle: (contains) with exact title matching (equals), removing a necessary client-side guard.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Query Intent] --> B{Intent Type?}
    B -->|"Keyword in title (intitle:)"| C["Build: --query intitle:X"]
    B -->|"Exact title equals X"| D["Build: --query intitle:\"X\""]
    B -->|"Boolean / exclusion"| E["Build: --query 'A OR B' / 'A -B'"]
    B -->|"Phrase match"| F["Build: --query \"exact phrase\""]

    C --> G[POST /search/v2/doc_wiki/search]
    D --> G
    E --> G
    F --> G

    G --> H{API honours advanced syntax?}
    H -->|"✅ Confirmed (unverified in PR)"| I[Server-side filtered results]
    H -->|"❌ Operators ignored"| J[Unfiltered / unexpected results]

    I --> K{Exact title equality needed?}
    K -->|"No — keyword-in-title sufficient"| L[Return results to agent]
    K -->|"Yes — intitle: only contains, not equals"| M["⚠️ Client-side exact-match filter\n(removed by new decision rule)"]
    M --> L

    J --> N["Agent misinterprets results\n(token explosion / hallucination)"]
Loading

Reviews (1): Last reviewed commit: "docs(lark-doc): document advanced boolea..." | Re-trigger Greptile

## 决策规则

- 查询语义:默认按关键词搜索理解。用户说“标题为 `X`”“标题里有 `X`”“搜索 `X` 文档”时,先直接返回命中的 OpenAPI 结果;只有用户明确要求“标题精确等于 `X`”时,才做客户端二次筛选。做精确匹配前,先去掉 `title_highlighted` 里的高亮标签。
- 查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求“标题精确等于 X”时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 intitle: ≠ exact title match — logic regression

The new decision rule mandates --query "intitle:\"X\"" for the case where the user says "标题精确等于 X" (exact title match) and explicitly prohibits client-side re-filtering. However, intitle:X is a contains predicate, not an equals predicate: it returns every document whose title contains X as a substring or keyword. A document titled "2026 规划汇总补充" would still match intitle:"2026规划".

The previous rule correctly acknowledged this gap and allowed client-side filtering (after stripping <h>/<hb> tags from title_highlighted) specifically for exact-equality scenarios. Removing that safety valve without a true server-side exact-title-equals operator means agents will silently over-report matches and the "total count hallucination" problem the PR aims to solve is not actually fixed — it is just relocated to a different type of false positive.

Suggested approach: keep the explicit client-side guard for exact-title-equals while still encouraging intitle: for the broader "keyword in title" intent:

- 查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。
  当用户要求"标题包含 X"时,使用 intitle:X。
  当用户要求"标题精确等于 X"时,使用 intitle:"X" 缩小候选集,但仍需在客户端对
  title_highlighted(去掉高亮标签后)做等值比对,因为 intitle: 是包含语义而非等值语义。
  严禁在 intitle: 已能覆盖的"标题含关键词"场景中再做客户端模糊二次筛选。

| 参数 | 必填 | 说明 |
|------|------|------|
| `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 |
| `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Advanced Boolean syntax not confirmed supported by the underlying API

The updated description asserts that the Lark Search v2 endpoint (POST /open-apis/search/v2/doc_wiki/search) supports AND (space), OR, - exclusion, "" phrase match, and intitle: as first-class query operators. However, neither the PR description nor any referenced documentation confirms that the upstream OpenAPI actually honours these operators.

If the API treats the query as a plain keyword string, an agent following these instructions would construct queries like "飞书 -纪要" or "intitle:\"2026规划\"" and receive unexpected results — or silently get results that ignore the operators entirely, producing the same token-explosion and hallucination problems the PR aims to prevent.

Before documenting these as supported features, please verify against the official Feishu/Lark OpenAPI documentation (or an actual API call) that each operator is genuinely honoured server-side, and ideally link to the relevant section of the API reference. If only some operators are confirmed, document only those.

| 参数 | 必填 | 说明 |
|------|------|------|
| `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 |
| `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "" example for phrase match looks like an empty string

Item 4 in the new param description reads:

  1. 使用双引号 "" 表示精确匹配短语。

Displaying "" in a code span reads as an empty double-quoted string. This is likely to confuse an AI agent (or a human reader) who expects a concrete usage example. Consider replacing "" with a real-world example consistent with the surrounding examples:

Suggested change
| `--query <text>` || 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结``intitle:"季度 总结"`)。不传/空字符串表示空搜 |
| `--query <text>` || 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号表示精确匹配短语(如 `"季度总结"`。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结``intitle:"季度 总结"`)。不传/空字符串表示空搜 |

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
skills/lark-doc/references/lark-doc-search.md (1)

62-62: Consider adding concrete examples for the client-side filtering exception.

The decision rule states: "只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤" (only allow client-side filtering when server syntax cannot cover complex local matching scenarios).

This exception clause provides necessary flexibility but lacks concrete examples. Without specific scenarios, different developers or the AI Agent might interpret "complex local matching scenarios" inconsistently, potentially leading to unnecessary client-side filtering when server-side syntax would suffice.

📝 Suggested clarification with examples

Consider appending specific examples after the exception clause, such as:

-查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求"标题精确等于 X"时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时,才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。
+查询语义:必须优先利用 --query 的高级语法(如 intitle:、""、-)将过滤逻辑下推给服务端。当用户要求"标题精确等于 X"时,直接使用 --query "intitle:\"X\"",严禁先进行模糊搜索再做客户端二次筛选。只有在遇到服务端语法无法覆盖的复杂本地比对场景时(例如:多字段联合条件、正则表达式匹配、自定义评分排序),才允许在客户端过滤,且比对前必须先去掉 title_highlighted 里的高亮标签。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/lark-doc/references/lark-doc-search.md` at line 62, Add concrete
examples illustrating the allowed "client-side filtering" exception right after
the sentence that permits client filtering when server syntax cannot cover
complex local matching; specifically, append 2–3 short scenarios (e.g., fuzzy
unicode normalization across diacritics, language-specific tokenization
differences, or multi-field proximity matches that the server lacks) and show
the minimal client-side steps (strip HTML tags from title_highlighted, then
apply the local comparison) and when to prefer --query with intitle:"X" instead;
reference the existing terms --query, intitle:, and title_highlighted so readers
can locate the rule and understand exact inputs and the required pre-filtering
step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/lark-doc/references/lark-doc-search.md`:
- Line 49: Update the `--query <text>` documentation to remove the unsupported
Boolean/operator examples and state that the Feishu Search v2 endpoint
`/open-apis/search/v2/doc_wiki/search` accepts only a simple text string (basic
keyword matching, max ~50 characters) — i.e., replace the current list of
Boolean operators (AND via spaces, OR, -, "", intitle:) with a concise note
about plain keyword matching; also note that `shortcuts/doc/docs_search.go`
passes the query directly to the API so no client-side parsing/validation is
performed.

---

Nitpick comments:
In `@skills/lark-doc/references/lark-doc-search.md`:
- Line 62: Add concrete examples illustrating the allowed "client-side
filtering" exception right after the sentence that permits client filtering when
server syntax cannot cover complex local matching; specifically, append 2–3
short scenarios (e.g., fuzzy unicode normalization across diacritics,
language-specific tokenization differences, or multi-field proximity matches
that the server lacks) and show the minimal client-side steps (strip HTML tags
from title_highlighted, then apply the local comparison) and when to prefer
--query with intitle:"X" instead; reference the existing terms --query,
intitle:, and title_highlighted so readers can locate the rule and understand
exact inputs and the required pre-filtering step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b910cee2-9ac6-47c0-9b7a-a73187c77dc7

📥 Commits

Reviewing files that changed from the base of the PR and between 79f43dc and 80c1917.

📒 Files selected for processing (1)
  • skills/lark-doc/references/lark-doc-search.md

| 参数 | 必填 | 说明 |
|------|------|------|
| `--query <text>` | 否 | 搜索关键词。默认是关键词检索,不是精确标题匹配;不传/空字符串表示空搜 |
| `--query <text>` | 否 | 搜索关键词。**支持高级 Boolean 语法**以提升搜索精度:<br>1. 使用空格表示 AND(如 `方案 设计`)。<br>2. 使用 `OR` 表示逻辑或(如 `方案 OR 草稿`)。<br>3. 使用 `-` 表示排除(如 `方案 -草稿`)。<br>4. 使用双引号 `""` 表示精确匹配短语。<br>5. 使用 `intitle:` 限定关键词出现在标题中(如 `intitle:总结` 或 `intitle:"季度 总结"`)。不传/空字符串表示空搜 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does the Feishu (Lark) Search v2 API endpoint /open-apis/search/v2/doc_wiki/search support Boolean operators like OR, negation with hyphen (-), exact phrase matching with double quotes, and intitle: syntax in the query parameter?

💡 Result:

No, the Feishu (Lark) Search v2 API endpoint /open-apis/search/v2/doc_wiki/search does not support Boolean operators like OR, negation with hyphen (-), exact phrase matching with double quotes, or intitle: syntax in the query parameter. The query is a simple string field limited to 50 characters, used for keyword matching with filters like doc_filter and wiki_filter. Official documentation describes basic full-match, prefix-match, and word-segmentation matching for Chinese/English in related open search features, but no advanced query syntax for this endpoint. UI advanced search guides exist separately but do not apply to the API.

Citations:


Remove unsupported Boolean operators from the --query parameter documentation.

The Feishu Search v2 API endpoint /open-apis/search/v2/doc_wiki/search does not support the documented Boolean operators (AND via spaces, OR, -, "", intitle:). According to the official API specification, the query parameter accepts only a simple text string (limited to 50 characters) for basic keyword matching.

The current documentation will mislead users into attempting search syntax that the API does not support, resulting in unexpected or broken search results. Since the code in shortcuts/doc/docs_search.go passes queries directly to the API without validation, there is no client-side fallback to handle invalid syntax.

Update line 49 to accurately describe the API's actual search capabilities—likely basic keyword matching without advanced Boolean syntax support.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/lark-doc/references/lark-doc-search.md` at line 49, Update the
`--query <text>` documentation to remove the unsupported Boolean/operator
examples and state that the Feishu Search v2 endpoint
`/open-apis/search/v2/doc_wiki/search` accepts only a simple text string (basic
keyword matching, max ~50 characters) — i.e., replace the current list of
Boolean operators (AND via spaces, OR, -, "", intitle:) with a concise note
about plain keyword matching; also note that `shortcuts/doc/docs_search.go`
passes the query directly to the API so no client-side parsing/validation is
performed.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@80c1917a6d8432c947ebe1bbe8c098405890e6e4

🧩 Skill update

npx skills add larksuite/cli#docs/enhance-doc-search-syntax -y -g

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants