Skip to content

feat(docs): normalize <lark-table> to GFM pipe tables in docs +fetch#260

Open
liuhedev wants to merge 2 commits intolarksuite:mainfrom
liuhedev:feat/docs-fetch-normalize-lark-table
Open

feat(docs): normalize <lark-table> to GFM pipe tables in docs +fetch#260
liuhedev wants to merge 2 commits intolarksuite:mainfrom
liuhedev:feat/docs-fetch-normalize-lark-table

Conversation

@liuhedev
Copy link
Copy Markdown

@liuhedev liuhedev commented Apr 3, 2026

Summary

docs +fetch outputs markdown containing custom <lark-table> tags that are not renderable in any standard markdown viewer (Obsidian, GitHub, VS Code, etc.). This PR adds automatic conversion to GFM pipe tables.

Changes

  • shortcuts/doc/lark_table.go: New NormalizeLarkTables() function that converts <lark-table>/<lark-tr>/<lark-td> blocks to standard GFM pipe tables
  • shortcuts/doc/docs_fetch.go: Calls NormalizeLarkTables() on the markdown output during pretty-print
  • shortcuts/doc/lark_table_test.go: Unit tests covering basic tables, no-header mode, code-block fallback, and pipe escaping

Behavior

Before After
<lark-table><lark-tr><lark-td>Name</lark-td>... | Name | Age |

Edge cases handled

  • header-row="false": Skips separator row
  • Fenced code blocks in cells: Falls back to original <lark-table> (GFM pipe tables cannot contain multi-line code)
  • Pipe characters: Escaped as \|
  • Nested <grid>/<column>/<text> tags: Flattened and stripped

Motivation

When using docs +fetch to export Lark documents for local use (e.g., in Obsidian vaults), tables are the most common broken element. This change makes the output immediately usable without post-processing.

Related issue: #259

Summary by CodeRabbit

  • New Features

    • Documentation now converts custom Lark-table markup into GitHub-flavored Markdown tables with proper headers (optional), column alignment, cell normalization (nested content flattened,
      normalized, pipe characters escaped), and preserves original markup if fenced code blocks are present.
  • Tests

    • Added comprehensive unit tests covering header handling, code-block preservation, special-character escaping, uneven columns, and nested-content flattening.

…output

When docs +fetch returns markdown containing <lark-table> custom tags,
they are not renderable in standard markdown viewers. This change adds
automatic conversion of <lark-table> blocks to GFM (GitHub Flavored
Markdown) pipe tables during the pretty-print output phase.

Features:
- Converts <lark-table>/<lark-tr>/<lark-td> to standard | pipe | tables
- Handles header-row attribute (defaults to true)
- Escapes pipe characters in cell content
- Falls back to original <lark-table> when cells contain fenced code blocks
- Strips nested <grid>/<column>/<text> tags during conversion

Includes unit tests for basic table, no-header, code-block fallback,
and pipe escaping scenarios.
@github-actions github-actions bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06a93abf-bbb3-43a7-9442-c7a923961d33

📥 Commits

Reviewing files that changed from the base of the PR and between 139f466 and f370afc.

📒 Files selected for processing (2)
  • shortcuts/doc/lark_table.go
  • shortcuts/doc/lark_table_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • shortcuts/doc/lark_table_test.go

📝 Walkthrough

Walkthrough

Adds a new doc.NormalizeLarkTables(md string) string to convert <lark-table> blocks into GitHub-flavored Markdown tables and calls it from the markdown post-processing step in shortcuts/doc/docs_fetch.go before writing output. No other control flow changes.

Changes

Cohort / File(s) Summary
Lark Table Normalization Logic
shortcuts/doc/lark_table.go
New exported NormalizeLarkTables() that finds <lark-table>...</lark-table> blocks, parses rows/cells, flattens <grid>/<column>, strips <text> wrappers, converts <br/> to newlines, escapes `
Lark Table Tests
shortcuts/doc/lark_table_test.go
New unit tests covering basic conversion, header-row="false" behavior, fenced-code preservation, pipe escaping, column padding, grid/column flattening, <text> stripping, <br/> normalization, and no-op when no <lark-table> present.
Document Processing Integration
shortcuts/doc/docs_fetch.go
Minor integration: post-processes result["markdown"] with NormalizeLarkTables() (assigns md = NormalizeLarkTables(md)) before writing output; no other logic changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibbled at tags of Lark’s little net,
Turned grids and columns to pipes with a fret,
Br and text faded, rows aligned with cheer,
No code-blocks harmed — I left those clear,
Hop! Markdown’s tidy now, ready to vet.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding automatic normalization of custom lark-table markup to GitHub Flavored Markdown pipe tables in the docs fetch command.
Description check ✅ Passed The description covers all template sections with comprehensive detail: Summary explains the problem, Changes lists all modified files with specifics, Test Plan confirms unit tests are included, and Related Issues links issue #259.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR adds NormalizeLarkTables() to convert <lark-table> markup to GFM pipe tables during docs +fetch output, with fallback to the original markup when cells contain fenced code blocks.

  • P1 — headerAttrRE matched against full block content: headerAttrRE.MatchString(block) on line 34 of lark_table.go scans the entire <lark-table>…</lark-table> block, so a cell whose text contains the literal string header-row=\"false\" would incorrectly suppress the header-separator row. The check should be restricted to the opening tag only.

Confidence Score: 4/5

Safe to merge after fixing the header-attribute detection scope bug.

One P1 logic bug: headerAttrRE is checked against the full table block (including all cell content) rather than just the opening tag, meaning a cell containing the literal text header-row="false" would incorrectly suppress the separator row. All other findings from prior review rounds have been addressed or are P2.

shortcuts/doc/lark_table.go line 34 — header attribute detection scope.

Important Files Changed

Filename Overview
shortcuts/doc/lark_table.go New file implementing Lark-to-GFM table conversion; one P1 logic bug where headerAttrRE is matched against the full table block (including cell content) instead of only the opening tag, which can incorrectly suppress the header-separator row.
shortcuts/doc/docs_fetch.go One-line integration of NormalizeLarkTables into the pretty-print path; change is minimal and correct.
shortcuts/doc/lark_table_test.go Good coverage of the main scenarios (basic, no-header, code-block fallback, pipe escaping, uneven columns, grid flattening, text tags, br normalization); missing a test for the header-attribute-in-cell-content edge case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["docs +fetch executes"] --> B["CallMCPTool fetch-doc"]
    B --> C["result markdown string"]
    C --> D["NormalizeLarkTables(md)"]
    D --> E{"larkTableRE\nfind match?"}
    E -- No --> F["return md unchanged"]
    E -- Yes --> G["Extract inner content\nCheck header-row attr on block"]
    G --> H{"Any cell\ncontains ```?"}
    H -- Yes --> I["return original\n<lark-table> block"]
    H -- No --> J["cleanCell each td\n(flattenGrid, stripTextTags,\nstrip lark tags, br→\n)"]
    J --> K["Pad rows to maxCols"]
    K --> L{"headerRow == true?"}
    L -- Yes --> M["Emit header row\n+ separator ---"]
    L -- No --> N["Skip separator"]
    M --> O["Emit data rows"]
    N --> O
    O --> P["Join lines with \n"]
    P --> Q["fmt.Fprintln output"]
Loading

Reviews (2): Last reviewed commit: "chore(docs): address review feedback for..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
shortcuts/doc/lark_table.go (1)

64-71: Fenced code block detection occurs after cleanCell processes content.

The check for fenced code blocks (strings.Contains(cell, "```")) happens after cells have been processed by cleanCell. Looking at cleanCell, it only strips <lark-*>, <text>, <grid>, and <br> tags, so the triple backticks should survive. However, this sequencing is subtle—if cleanCell is ever modified to strip more content, this check could silently break.

Consider moving the fenced code check to operate on the raw cell content before cleaning, or add a comment clarifying the dependency.

💡 Alternative: check raw content before cleaning
 		for _, rm := range rowMatches {
 			cellMatches := larkTdRE.FindAllStringSubmatch(rm[1], -1)
 			var cells []string
 			for _, cm := range cellMatches {
+				// Check for fenced code blocks in raw content before cleaning
+				if strings.Contains(cm[1], "```") {
+					return block
+				}
 				cells = append(cells, cleanCell(cm[1]))
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/lark_table.go` around lines 64 - 71, The fenced-code detection
currently checks for "```" after cells are processed by cleanCell; to avoid
future brittle behavior move the fenced-code check to operate on the raw cell
content before calling cleanCell (i.e., while iterating rows and the original
cell string from rows), returning block immediately if any raw cell contains
"```", or if you prefer to keep the current order add a clear comment next to
the loop documenting the dependency that cleanCell must not strip backticks and
that the fenced-code check must run on raw input; reference the rows iteration
and cleanCell/cells/block identifiers when making this change.
shortcuts/doc/lark_table_test.go (2)

57-63: Consider adding tests for additional edge cases.

The current tests cover core scenarios well, but the following documented behaviors lack test coverage:

  • Nested <grid>/<column> flattening
  • <text> tag stripping
  • Uneven column counts (row padding)
  • <br> tag conversion to newlines and back

These could be added in a follow-up to strengthen regression protection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/lark_table_test.go` around lines 57 - 63, Add unit tests to
shortc uts/doc/lark_table_test.go that exercise missing edge cases for the
NormalizeLarkTables function: add separate Test* cases validating nested
<grid>/<column> flattening (e.g., nested columns collapse to single table),
stripping of <text> tags inside cells, behavior when rows have uneven column
counts (ensure padding or empty cells are produced), and conversion of <br> tags
to newlines and back; implement each test by constructing representative input
strings, calling NormalizeLarkTables, and asserting the expected normalized
output to protect these documented behaviors.

27-35: Consider adding a positive assertion for the data row.

The test verifies the separator row is absent, but doesn't confirm the data row | A | B | is present. Adding this check would ensure the function outputs content correctly in no-header mode.

💡 Suggested addition
 	if strings.Contains(result, "| --- |") {
 		t.Errorf("should not have separator when header-row=false, got:\n%s", result)
 	}
+	if !strings.Contains(result, "| A | B |") {
+		t.Errorf("expected data row, got:\n%s", result)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/lark_table_test.go` around lines 27 - 35, The test
TestNormalizeLarkTables_NoHeaderRow currently only asserts the absence of the
separator row; add a positive assertion that the converted table contains the
expected data row (e.g., check that NormalizeLarkTables(input) includes the
string "| A | B |"). Update the test to call result :=
NormalizeLarkTables(input) (already present) and add a strings.Contains check
that calls t.Errorf with a clear message if the data row is missing; reference
the test name and NormalizeLarkTables to locate where to add this assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/doc/lark_table.go`:
- Around line 149-152: Remove the unused fmt import and the init() shim that
references fmt.Sprint: delete the fmt import from the file's import block and
remove the init() function (the lines containing func init() { _ = fmt.Sprint })
so there is no unused-import workaround; ensure no other code in this file
depends on fmt before committing.

---

Nitpick comments:
In `@shortcuts/doc/lark_table_test.go`:
- Around line 57-63: Add unit tests to shortc uts/doc/lark_table_test.go that
exercise missing edge cases for the NormalizeLarkTables function: add separate
Test* cases validating nested <grid>/<column> flattening (e.g., nested columns
collapse to single table), stripping of <text> tags inside cells, behavior when
rows have uneven column counts (ensure padding or empty cells are produced), and
conversion of <br> tags to newlines and back; implement each test by
constructing representative input strings, calling NormalizeLarkTables, and
asserting the expected normalized output to protect these documented behaviors.
- Around line 27-35: The test TestNormalizeLarkTables_NoHeaderRow currently only
asserts the absence of the separator row; add a positive assertion that the
converted table contains the expected data row (e.g., check that
NormalizeLarkTables(input) includes the string "| A | B |"). Update the test to
call result := NormalizeLarkTables(input) (already present) and add a
strings.Contains check that calls t.Errorf with a clear message if the data row
is missing; reference the test name and NormalizeLarkTables to locate where to
add this assertion.

In `@shortcuts/doc/lark_table.go`:
- Around line 64-71: The fenced-code detection currently checks for "```" after
cells are processed by cleanCell; to avoid future brittle behavior move the
fenced-code check to operate on the raw cell content before calling cleanCell
(i.e., while iterating rows and the original cell string from rows), returning
block immediately if any raw cell contains "```", or if you prefer to keep the
current order add a clear comment next to the loop documenting the dependency
that cleanCell must not strip backticks and that the fenced-code check must run
on raw input; reference the rows iteration and cleanCell/cells/block identifiers
when making this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c3bc9e4-589f-46ab-9fa7-3ba542fc1078

📥 Commits

Reviewing files that changed from the base of the PR and between 0c77c95 and 139f466.

📒 Files selected for processing (3)
  • shortcuts/doc/docs_fetch.go
  • shortcuts/doc/lark_table.go
  • shortcuts/doc/lark_table_test.go

- Refactor NormalizeLarkTables to check for fenced code blocks on raw content
- Remove unused fmt import and init() shim
- Make headerAttrRE case-insensitive
- Add comprehensive unit tests for edge cases:
    - Uneven column counts (row padding)
    - Grid/column layout flattening
    - Text tag stripping
    - <br/> tag normalization
- Add positive assertion to TestNormalizeLarkTables_NoHeaderRow
- Improve docstring coverage for exported functions
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 3, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants