Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental Docs JSON API for TiDB documentation, including scripts for building a searchable JSON index and a lightweight HTTP server. The implementation includes logic for parsing markdown files to extract metadata such as features, topics, and headings. Feedback focuses on improving the robustness of the feature extraction regex, prioritizing front matter for document summaries, ensuring proper URL encoding in documentation, and adopting more reliable methods for YAML parsing and file modification timestamps.
|
|
||
| const extractFeatures = (content, frontMatterData) => { | ||
| const features = new Set(); | ||
| const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g; |
There was a problem hiding this comment.
The regex \b[a-z]+(?:_[a-z0-9]+){2,}\b requires at least two underscores to match a token. This will miss common TiDB system variables or configuration items that have only one underscore (e.g., tidb_config, tidb_port). Changing {2,} to + will ensure these are correctly indexed as features.
| const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g; | |
| const varRegex = /\b[a-z]+(?:_[a-z0-9]+)+\b/g; |
| id: relativePath.replace(/\.md$/, ""), | ||
| path: relativePath, | ||
| title, | ||
| summary, |
| - `GET /docs` | ||
| - `GET /docs?feature=tidb_max_dist_task_nodes` | ||
| - `GET /docs?topic=tidb-cloud` | ||
| - `GET /docs?q=resource control` |
There was a problem hiding this comment.
The query parameter q=resource control contains an unencoded space. In a URL, spaces should be encoded as %20 or + to be technically accurate.
| - `GET /docs?q=resource control` | |
| - `GET /docs?q=resource%20control` |
References
- Ensure technical accuracy in documentation, such as proper URL encoding. (link)
| const parseSimpleYaml = (raw = "") => { | ||
| const result = {}; | ||
| let currentArrayKey = null; | ||
|
|
||
| raw.split(/\r?\n/).forEach((line) => { | ||
| if (!line.trim() || line.trim().startsWith("#")) { | ||
| return; | ||
| } | ||
|
|
||
| const kvMatch = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/); | ||
| if (kvMatch) { | ||
| const key = kvMatch[1]; | ||
| const value = kvMatch[2]; | ||
| if (!value.trim()) { | ||
| result[key] = []; | ||
| currentArrayKey = key; | ||
| } else { | ||
| result[key] = parseScalar(value); | ||
| currentArrayKey = null; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| const listMatch = line.match(/^\s*-\s*(.*)$/); | ||
| if (listMatch && currentArrayKey) { | ||
| result[currentArrayKey].push(parseScalar(listMatch[1])); | ||
| return; | ||
| } | ||
|
|
||
| currentArrayKey = null; | ||
| }); | ||
|
|
||
| return result; | ||
| }; |
There was a problem hiding this comment.
The custom parseSimpleYaml function is very basic and does not support common YAML features like multi-line strings (|, >) or nested objects, which are occasionally used in TiDB documentation front matter. Since the project already has dependencies for markdown processing, using a standard YAML parser would be more reliable.
| headings, | ||
| frontMatter, | ||
| frontMatterRaw, | ||
| updatedAt: docStat.mtime.toISOString(), |
There was a problem hiding this comment.
Introduce DOCS_API_SOURCE_DIR and source-directory resolution so the docs API can prefer ../docs-staging if present. Update documentation to describe source priority. Propagate the resolved source dir into build-docs-api-index and docs-api-server (including startup/log messages). Add support for template variable replacement in markdown using variables.json (pattern {{{ .path }}}) with a safe parser and warning on JSON parse failure. Improve file collection to skip dot-directories, ignore api/docs-json-api.md, and normalize paths. Minor refactors: pass resolved sourceDir into buildDocsIndex, add resolveDefaultSourceDir helper, and include variables when parsing markdown.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Expose on-demand markdown content and improve search/indexing. Adds markdownToSearchText and stores _searchText for full-text matching, plus loadTemplateVariables and loadDocContentByPath to safely load rendered markdown. Server changes add includeContent query support for /docs, a /docs/content endpoint for single-doc content retrieval, and a toPublicDoc serializer; also introduces isTruthy helper and updates docs API schema and docs-json-api.md to document performance behavior and new query options. These changes keep list responses lightweight while enabling full-content fetches when needed.
Introduce an experimental MCP server to expose TiDB docs over STDIO. Adds scripts/docs-mcp-server.js implementing MCP handlers (search_docs, get_doc_content, list_topics, list_features, reload_docs_index) and resource URIs (docs://schema, docs://index/meta, docs://doc/<path>). Adds user-facing docs at api/docs-mcp-server.md with usage and example .mcp.json, and registers the new npm script "docs-mcp:serve" in package.json. The server builds and serves the docs index, loads template variables, and supports reloading the index at runtime.
Add HTTP transport, authentication and multi-source support to the Docs MCP server. Docs and README updated with HTTP usage, /mcp and /healthz endpoints, bearer token auth and x-docs-source header. package.json adds a new npm script (docs-mcp:serve:http). scripts/docs-mcp-server.js refactored: import http, bump server version, introduce TRANSPORT/HTTP_HOST/HTTP_PORT/AUTH_TOKEN/SOURCE_MAP, per-source state caching (index + template variables), source refresh, and modular handlers; implement JSON-RPC processing over both stdio and HTTP with proper error handling. scripts/docs-api-lib.js excludes the docs MCP markdown from indexing. Overall this enables running the MCP server over HTTP, supports multiple doc sources, and preserves the original stdio mode.
Add a new integration guide (ai/integrations/tidb-docs-mcp-server.md) that documents the TiDB Docs MCP Server (stdio/http transports, auth, source isolation, tools, resources, examples, validation and troubleshooting). Update the TOC (TOC-ai.md) to include the new entry and overhaul api/docs-mcp-server.md to match the expanded title/content and provide clearer start, auth, source-map, HTTP examples, validation steps, and design notes.
|
@hfxsd: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What is changed, added or deleted? (Required)
This PR adds two complementary access layers for TiDB Docs:
Docs JSON API for traditional clients (/docs, /docs/content, /features, /topics, /schema), with lightweight list responses by default and full markdown on demand.
Docs MCP Server for AI clients (Claude/Cursor/VS Code), supporting both STDIO and HTTP transports (/mcp).
Both layers share the same indexing pipeline with:
structured metadata extraction (topic/feature/headings/front matter),
full-text search support,
template variable resolution (for example {{{ .starter }}} via variables.json),
source isolation for staging/prod (MCP x-docs-source + source map),
optional auth for HTTP MCP (Bearer token).
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?