Skip to content

script: add docs API#22690

Open
hfxsd wants to merge 7 commits intopingcap:masterfrom
hfxsd:docs-api
Open

script: add docs API#22690
hfxsd wants to merge 7 commits intopingcap:masterfrom
hfxsd:docs-api

Conversation

@hfxsd
Copy link
Copy Markdown
Collaborator

@hfxsd hfxsd commented Apr 3, 2026

What is changed, added or deleted? (Required)

This PR adds two complementary access layers for TiDB Docs:

Docs JSON API for traditional clients (/docs, /docs/content, /features, /topics, /schema), with lightweight list responses by default and full markdown on demand.
Docs MCP Server for AI clients (Claude/Cursor/VS Code), supporting both STDIO and HTTP transports (/mcp).
Both layers share the same indexing pipeline with:

structured metadata extraction (topic/feature/headings/front matter),
full-text search support,
template variable resolution (for example {{{ .starter }}} via variables.json),
source isolation for staging/prod (MCP x-docs-source + source map),
optional auth for HTTP MCP (Bearer token).

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@hfxsd hfxsd self-assigned this Apr 3, 2026
@ti-chi-bot ti-chi-bot bot added the contribution This PR is from a community contributor. label Apr 3, 2026
@hfxsd hfxsd requested a review from Oreoxmt April 3, 2026 04:06
@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental Docs JSON API for TiDB documentation, including scripts for building a searchable JSON index and a lightweight HTTP server. The implementation includes logic for parsing markdown files to extract metadata such as features, topics, and headings. Feedback focuses on improving the robustness of the feature extraction regex, prioritizing front matter for document summaries, ensuring proper URL encoding in documentation, and adopting more reliable methods for YAML parsing and file modification timestamps.


const extractFeatures = (content, frontMatterData) => {
const features = new Set();
const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex \b[a-z]+(?:_[a-z0-9]+){2,}\b requires at least two underscores to match a token. This will miss common TiDB system variables or configuration items that have only one underscore (e.g., tidb_config, tidb_port). Changing {2,} to + will ensure these are correctly indexed as features.

Suggested change
const varRegex = /\b[a-z]+(?:_[a-z0-9]+){2,}\b/g;
const varRegex = /\b[a-z]+(?:_[a-z0-9]+)+\b/g;

id: relativePath.replace(/\.md$/, ""),
path: relativePath,
title,
summary,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The summary field in the document record should prioritize the explicit summary defined in the front matter over the one extracted from the body text.

    summary: safeString(frontMatter.summary) || summary;

- `GET /docs`
- `GET /docs?feature=tidb_max_dist_task_nodes`
- `GET /docs?topic=tidb-cloud`
- `GET /docs?q=resource control`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The query parameter q=resource control contains an unencoded space. In a URL, spaces should be encoded as %20 or + to be technically accurate.

Suggested change
- `GET /docs?q=resource control`
- `GET /docs?q=resource%20control`
References
  1. Ensure technical accuracy in documentation, such as proper URL encoding. (link)

Comment on lines +34 to +67
const parseSimpleYaml = (raw = "") => {
const result = {};
let currentArrayKey = null;

raw.split(/\r?\n/).forEach((line) => {
if (!line.trim() || line.trim().startsWith("#")) {
return;
}

const kvMatch = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
if (kvMatch) {
const key = kvMatch[1];
const value = kvMatch[2];
if (!value.trim()) {
result[key] = [];
currentArrayKey = key;
} else {
result[key] = parseScalar(value);
currentArrayKey = null;
}
return;
}

const listMatch = line.match(/^\s*-\s*(.*)$/);
if (listMatch && currentArrayKey) {
result[currentArrayKey].push(parseScalar(listMatch[1]));
return;
}

currentArrayKey = null;
});

return result;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The custom parseSimpleYaml function is very basic and does not support common YAML features like multi-line strings (|, >) or nested objects, which are occasionally used in TiDB documentation front matter. Since the project already has dependencies for markdown processing, using a standard YAML parser would be more reliable.

headings,
frontMatter,
frontMatterRaw,
updatedAt: docStat.mtime.toISOString(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Using fs.statSync(absPath).mtime for updatedAt can be unreliable in CI/CD environments or fresh clones, as it reflects the file creation time on the local disk rather than the actual content modification time in Git. Consider using git log -1 --format=%aI -- <file> for a more accurate timestamp.

Introduce DOCS_API_SOURCE_DIR and source-directory resolution so the docs API can prefer ../docs-staging if present. Update documentation to describe source priority. Propagate the resolved source dir into build-docs-api-index and docs-api-server (including startup/log messages). Add support for template variable replacement in markdown using variables.json (pattern {{{ .path }}}) with a safe parser and warning on JSON parse failure. Improve file collection to skip dot-directories, ignore api/docs-json-api.md, and normalize paths. Minor refactors: pass resolved sourceDir into buildDocsIndex, add resolveDefaultSourceDir helper, and include variables when parsing markdown.
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hfxsd. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hfxsd added 5 commits April 3, 2026 14:51
Expose on-demand markdown content and improve search/indexing. Adds markdownToSearchText and stores _searchText for full-text matching, plus loadTemplateVariables and loadDocContentByPath to safely load rendered markdown. Server changes add includeContent query support for /docs, a /docs/content endpoint for single-doc content retrieval, and a toPublicDoc serializer; also introduces isTruthy helper and updates docs API schema and docs-json-api.md to document performance behavior and new query options. These changes keep list responses lightweight while enabling full-content fetches when needed.
Introduce an experimental MCP server to expose TiDB docs over STDIO. Adds scripts/docs-mcp-server.js implementing MCP handlers (search_docs, get_doc_content, list_topics, list_features, reload_docs_index) and resource URIs (docs://schema, docs://index/meta, docs://doc/<path>). Adds user-facing docs at api/docs-mcp-server.md with usage and example .mcp.json, and registers the new npm script "docs-mcp:serve" in package.json. The server builds and serves the docs index, loads template variables, and supports reloading the index at runtime.
Add HTTP transport, authentication and multi-source support to the Docs MCP server. Docs and README updated with HTTP usage, /mcp and /healthz endpoints, bearer token auth and x-docs-source header. package.json adds a new npm script (docs-mcp:serve:http). scripts/docs-mcp-server.js refactored: import http, bump server version, introduce TRANSPORT/HTTP_HOST/HTTP_PORT/AUTH_TOKEN/SOURCE_MAP, per-source state caching (index + template variables), source refresh, and modular handlers; implement JSON-RPC processing over both stdio and HTTP with proper error handling. scripts/docs-api-lib.js excludes the docs MCP markdown from indexing. Overall this enables running the MCP server over HTTP, supports multiple doc sources, and preserves the original stdio mode.
Add a new integration guide (ai/integrations/tidb-docs-mcp-server.md) that documents the TiDB Docs MCP Server (stdio/http transports, auth, source isolation, tools, resources, examples, validation and troubleshooting). Update the TOC (TOC-ai.md) to include the new entry and overhaul api/docs-mcp-server.md to match the expanded title/content and provide clearer start, auth, source-map, HTTP examples, validation steps, and design notes.
@hfxsd hfxsd added translation/no-need No need to translate this PR. and removed missing-translation-status This PR does not have translation status info. labels Apr 3, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 3, 2026

@hfxsd: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-verify 1b0ced1 link true /test pull-verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant