script: add docs API by hfxsd · Pull Request #22690 · pingcap/docs

hfxsd · 2026-04-03T04:06:03Z

What is changed, added or deleted? (Required)

This PR adds two complementary access layers for TiDB Docs:

Docs JSON API for traditional clients (/docs, /docs/content, /features, /topics, /schema), with lightweight list responses by default and full markdown on demand.
Docs MCP Server for AI clients (Claude/Cursor/VS Code), supporting both STDIO and HTTP transports (/mcp).
Both layers share the same indexing pipeline with:

structured metadata extraction (topic/feature/headings/front matter),
full-text search support,
template variable resolution (for example {{{ .starter }}} via variables.json),
source isolation for staging/prod (MCP x-docs-source + source map),
optional auth for HTTP MCP (Bearer token).

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

What is the related PR or file link(s)?

This PR is translated from:
Other reference link(s):

Do your changes match any of the following descriptions?

Delete files
Change aliases
Need modification after applied to another branch
Might cause conflicts after applied to another branch

gemini-code-assist

Code Review

This pull request introduces an experimental Docs JSON API for TiDB documentation, including scripts for building a searchable JSON index and a lightweight HTTP server. The implementation includes logic for parsing markdown files to extract metadata such as features, topics, and headings. Feedback focuses on improving the robustness of the feature extraction regex, prioritizing front matter for document summaries, ensuring proper URL encoding in documentation, and adopting more reliable methods for YAML parsing and file modification timestamps.

gemini-code-assist · 2026-04-03T04:07:53Z

scripts/docs-api-lib.js

+
+const extractFeatures = (content, frontMatterData) => {
+  const features = new Set();
+  const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g;


The regex \b[a-z]+(?:_[a-z0-9]+){2,}\b requires at least two underscores to match a token. This will miss common TiDB system variables or configuration items that have only one underscore (e.g., tidb_config, tidb_port). Changing {2,} to + will ensure these are correctly indexed as features.

Suggested change

const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g;

const varRegex = /\b[a-z]+(?:_[a-z0-9]+)+\b/g;

gemini-code-assist · 2026-04-03T04:07:53Z

scripts/docs-api-lib.js

+    id: relativePath.replace(/\.md$/, ""),
+    path: relativePath,
+    title,
+    summary,


The summary field in the document record should prioritize the explicit summary defined in the front matter over the one extracted from the body text.

summary: safeString(frontMatter.summary) || summary;

gemini-code-assist · 2026-04-03T04:07:53Z

api/docs-json-api.md

+- `GET /docs`
+- `GET /docs?feature=tidb_max_dist_task_nodes`
+- `GET /docs?topic=tidb-cloud`
+- `GET /docs?q=resource control`


The query parameter q=resource control contains an unencoded space. In a URL, spaces should be encoded as %20 or + to be technically accurate.

Suggested change

- `GET /docs?q=resource control`

- `GET /docs?q=resource%20control`

References

Ensure technical accuracy in documentation, such as proper URL encoding. ^(link)

gemini-code-assist · 2026-04-03T04:07:53Z

scripts/docs-api-lib.js

+const parseSimpleYaml = (raw = "") => {
+  const result = {};
+  let currentArrayKey = null;
+
+  raw.split(/\r?\n/).forEach((line) => {
+    if (!line.trim() || line.trim().startsWith("#")) {
+      return;
+    }
+
+    const kvMatch = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
+    if (kvMatch) {
+      const key = kvMatch[1];
+      const value = kvMatch[2];
+      if (!value.trim()) {
+        result[key] = [];
+        currentArrayKey = key;
+      } else {
+        result[key] = parseScalar(value);
+        currentArrayKey = null;
+      }
+      return;
+    }
+
+    const listMatch = line.match(/^\s*-\s*(.*)$/);
+    if (listMatch && currentArrayKey) {
+      result[currentArrayKey].push(parseScalar(listMatch[1]));
+      return;
+    }
+
+    currentArrayKey = null;
+  });
+
+  return result;
+};


The custom parseSimpleYaml function is very basic and does not support common YAML features like multi-line strings (|, >) or nested objects, which are occasionally used in TiDB documentation front matter. Since the project already has dependencies for markdown processing, using a standard YAML parser would be more reliable.

gemini-code-assist · 2026-04-03T04:07:53Z

scripts/docs-api-lib.js

+    headings,
+    frontMatter,
+    frontMatterRaw,
+    updatedAt: docStat.mtime.toISOString(),


Using fs.statSync(absPath).mtime for updatedAt can be unreliable in CI/CD environments or fresh clones, as it reflects the file creation time on the local disk rather than the actual content modification time in Git. Consider using git log -1 --format=%aI -- <file> for a more accurate timestamp.

Introduce DOCS_API_SOURCE_DIR and source-directory resolution so the docs API can prefer ../docs-staging if present. Update documentation to describe source priority. Propagate the resolved source dir into build-docs-api-index and docs-api-server (including startup/log messages). Add support for template variable replacement in markdown using variables.json (pattern {{{ .path }}}) with a safe parser and warning on JSON parse failure. Improve file collection to skip dot-directories, ignore api/docs-json-api.md, and normalize paths. Minor refactors: pass resolved sourceDir into buildDocsIndex, add resolveDefaultSourceDir helper, and include variables when parsing markdown.

ti-chi-bot · 2026-04-03T06:18:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hfxsd. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Expose on-demand markdown content and improve search/indexing. Adds markdownToSearchText and stores _searchText for full-text matching, plus loadTemplateVariables and loadDocContentByPath to safely load rendered markdown. Server changes add includeContent query support for /docs, a /docs/content endpoint for single-doc content retrieval, and a toPublicDoc serializer; also introduces isTruthy helper and updates docs API schema and docs-json-api.md to document performance behavior and new query options. These changes keep list responses lightweight while enabling full-content fetches when needed.

Introduce an experimental MCP server to expose TiDB docs over STDIO. Adds scripts/docs-mcp-server.js implementing MCP handlers (search_docs, get_doc_content, list_topics, list_features, reload_docs_index) and resource URIs (docs://schema, docs://index/meta, docs://doc/<path>). Adds user-facing docs at api/docs-mcp-server.md with usage and example .mcp.json, and registers the new npm script "docs-mcp:serve" in package.json. The server builds and serves the docs index, loads template variables, and supports reloading the index at runtime.

Add HTTP transport, authentication and multi-source support to the Docs MCP server. Docs and README updated with HTTP usage, /mcp and /healthz endpoints, bearer token auth and x-docs-source header. package.json adds a new npm script (docs-mcp:serve:http). scripts/docs-mcp-server.js refactored: import http, bump server version, introduce TRANSPORT/HTTP_HOST/HTTP_PORT/AUTH_TOKEN/SOURCE_MAP, per-source state caching (index + template variables), source refresh, and modular handlers; implement JSON-RPC processing over both stdio and HTTP with proper error handling. scripts/docs-api-lib.js excludes the docs MCP markdown from indexing. Overall this enables running the MCP server over HTTP, supports multiple doc sources, and preserves the original stdio mode.

Add a new integration guide (ai/integrations/tidb-docs-mcp-server.md) that documents the TiDB Docs MCP Server (stdio/http transports, auth, source isolation, tools, resources, examples, validation and troubleshooting). Update the TOC (TOC-ai.md) to include the new entry and overhaul api/docs-mcp-server.md to match the expanded title/content and provide clearer start, auth, source-map, HTTP examples, validation steps, and design notes.

ti-chi-bot · 2026-04-03T08:46:17Z

@hfxsd: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-verify	`1b0ced1`	link	true	`/test pull-verify`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

delete unnecessary changes

be9fb8b

hfxsd self-assigned this Apr 3, 2026

ti-chi-bot bot added the contribution This PR is from a community contributor. label Apr 3, 2026

hfxsd requested a review from Oreoxmt April 3, 2026 04:06

ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 3, 2026

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

hfxsd added 5 commits April 3, 2026 14:51

Update docs-mcp-server.md

42da92f

hfxsd added translation/no-need No need to translate this PR. and removed missing-translation-status This PR does not have translation status info. labels Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

script: add docs API#22690

script: add docs API#22690
hfxsd wants to merge 7 commits intopingcap:masterfrom
hfxsd:docs-api

hfxsd commented Apr 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

ti-chi-bot bot commented Apr 3, 2026

Uh oh!

ti-chi-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g;
	const varRegex = /\b[a-z]+(?:_[a-z0-9]+)+\b/g;

	- `GET /docs?q=resource control`
	- `GET /docs?q=resource%20control`

Conversation

hfxsd commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Apr 3, 2026

Uh oh!

ti-chi-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hfxsd commented Apr 3, 2026 •

edited

Loading