feat(docs): normalize <lark-table> to GFM pipe tables in docs +fetch#260
feat(docs): normalize <lark-table> to GFM pipe tables in docs +fetch#260liuhedev wants to merge 2 commits intolarksuite:mainfrom
Conversation
…output When docs +fetch returns markdown containing <lark-table> custom tags, they are not renderable in standard markdown viewers. This change adds automatic conversion of <lark-table> blocks to GFM (GitHub Flavored Markdown) pipe tables during the pretty-print output phase. Features: - Converts <lark-table>/<lark-tr>/<lark-td> to standard | pipe | tables - Handles header-row attribute (defaults to true) - Escapes pipe characters in cell content - Falls back to original <lark-table> when cells contain fenced code blocks - Strips nested <grid>/<column>/<text> tags during conversion Includes unit tests for basic table, no-header, code-block fallback, and pipe escaping scenarios.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR adds
Confidence Score: 4/5Safe to merge after fixing the header-attribute detection scope bug. One P1 logic bug:
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["docs +fetch executes"] --> B["CallMCPTool fetch-doc"]
B --> C["result markdown string"]
C --> D["NormalizeLarkTables(md)"]
D --> E{"larkTableRE\nfind match?"}
E -- No --> F["return md unchanged"]
E -- Yes --> G["Extract inner content\nCheck header-row attr on block"]
G --> H{"Any cell\ncontains ```?"}
H -- Yes --> I["return original\n<lark-table> block"]
H -- No --> J["cleanCell each td\n(flattenGrid, stripTextTags,\nstrip lark tags, br→\n)"]
J --> K["Pad rows to maxCols"]
K --> L{"headerRow == true?"}
L -- Yes --> M["Emit header row\n+ separator ---"]
L -- No --> N["Skip separator"]
M --> O["Emit data rows"]
N --> O
O --> P["Join lines with \n"]
P --> Q["fmt.Fprintln output"]
Reviews (2): Last reviewed commit: "chore(docs): address review feedback for..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
shortcuts/doc/lark_table.go (1)
64-71: Fenced code block detection occurs aftercleanCellprocesses content.The check for fenced code blocks (
strings.Contains(cell, "```")) happens after cells have been processed bycleanCell. Looking atcleanCell, it only strips<lark-*>,<text>,<grid>, and<br>tags, so the triple backticks should survive. However, this sequencing is subtle—ifcleanCellis ever modified to strip more content, this check could silently break.Consider moving the fenced code check to operate on the raw cell content before cleaning, or add a comment clarifying the dependency.
💡 Alternative: check raw content before cleaning
for _, rm := range rowMatches { cellMatches := larkTdRE.FindAllStringSubmatch(rm[1], -1) var cells []string for _, cm := range cellMatches { + // Check for fenced code blocks in raw content before cleaning + if strings.Contains(cm[1], "```") { + return block + } cells = append(cells, cleanCell(cm[1])) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/lark_table.go` around lines 64 - 71, The fenced-code detection currently checks for "```" after cells are processed by cleanCell; to avoid future brittle behavior move the fenced-code check to operate on the raw cell content before calling cleanCell (i.e., while iterating rows and the original cell string from rows), returning block immediately if any raw cell contains "```", or if you prefer to keep the current order add a clear comment next to the loop documenting the dependency that cleanCell must not strip backticks and that the fenced-code check must run on raw input; reference the rows iteration and cleanCell/cells/block identifiers when making this change.shortcuts/doc/lark_table_test.go (2)
57-63: Consider adding tests for additional edge cases.The current tests cover core scenarios well, but the following documented behaviors lack test coverage:
- Nested
<grid>/<column>flattening<text>tag stripping- Uneven column counts (row padding)
<br>tag conversion to newlines and backThese could be added in a follow-up to strengthen regression protection.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/lark_table_test.go` around lines 57 - 63, Add unit tests to shortc uts/doc/lark_table_test.go that exercise missing edge cases for the NormalizeLarkTables function: add separate Test* cases validating nested <grid>/<column> flattening (e.g., nested columns collapse to single table), stripping of <text> tags inside cells, behavior when rows have uneven column counts (ensure padding or empty cells are produced), and conversion of <br> tags to newlines and back; implement each test by constructing representative input strings, calling NormalizeLarkTables, and asserting the expected normalized output to protect these documented behaviors.
27-35: Consider adding a positive assertion for the data row.The test verifies the separator row is absent, but doesn't confirm the data row
| A | B |is present. Adding this check would ensure the function outputs content correctly in no-header mode.💡 Suggested addition
if strings.Contains(result, "| --- |") { t.Errorf("should not have separator when header-row=false, got:\n%s", result) } + if !strings.Contains(result, "| A | B |") { + t.Errorf("expected data row, got:\n%s", result) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/lark_table_test.go` around lines 27 - 35, The test TestNormalizeLarkTables_NoHeaderRow currently only asserts the absence of the separator row; add a positive assertion that the converted table contains the expected data row (e.g., check that NormalizeLarkTables(input) includes the string "| A | B |"). Update the test to call result := NormalizeLarkTables(input) (already present) and add a strings.Contains check that calls t.Errorf with a clear message if the data row is missing; reference the test name and NormalizeLarkTables to locate where to add this assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/doc/lark_table.go`:
- Around line 149-152: Remove the unused fmt import and the init() shim that
references fmt.Sprint: delete the fmt import from the file's import block and
remove the init() function (the lines containing func init() { _ = fmt.Sprint })
so there is no unused-import workaround; ensure no other code in this file
depends on fmt before committing.
---
Nitpick comments:
In `@shortcuts/doc/lark_table_test.go`:
- Around line 57-63: Add unit tests to shortc uts/doc/lark_table_test.go that
exercise missing edge cases for the NormalizeLarkTables function: add separate
Test* cases validating nested <grid>/<column> flattening (e.g., nested columns
collapse to single table), stripping of <text> tags inside cells, behavior when
rows have uneven column counts (ensure padding or empty cells are produced), and
conversion of <br> tags to newlines and back; implement each test by
constructing representative input strings, calling NormalizeLarkTables, and
asserting the expected normalized output to protect these documented behaviors.
- Around line 27-35: The test TestNormalizeLarkTables_NoHeaderRow currently only
asserts the absence of the separator row; add a positive assertion that the
converted table contains the expected data row (e.g., check that
NormalizeLarkTables(input) includes the string "| A | B |"). Update the test to
call result := NormalizeLarkTables(input) (already present) and add a
strings.Contains check that calls t.Errorf with a clear message if the data row
is missing; reference the test name and NormalizeLarkTables to locate where to
add this assertion.
In `@shortcuts/doc/lark_table.go`:
- Around line 64-71: The fenced-code detection currently checks for "```" after
cells are processed by cleanCell; to avoid future brittle behavior move the
fenced-code check to operate on the raw cell content before calling cleanCell
(i.e., while iterating rows and the original cell string from rows), returning
block immediately if any raw cell contains "```", or if you prefer to keep the
current order add a clear comment next to the loop documenting the dependency
that cleanCell must not strip backticks and that the fenced-code check must run
on raw input; reference the rows iteration and cleanCell/cells/block identifiers
when making this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c3bc9e4-589f-46ab-9fa7-3ba542fc1078
📒 Files selected for processing (3)
shortcuts/doc/docs_fetch.goshortcuts/doc/lark_table.goshortcuts/doc/lark_table_test.go
- Refactor NormalizeLarkTables to check for fenced code blocks on raw content
- Remove unused fmt import and init() shim
- Make headerAttrRE case-insensitive
- Add comprehensive unit tests for edge cases:
- Uneven column counts (row padding)
- Grid/column layout flattening
- Text tag stripping
- <br/> tag normalization
- Add positive assertion to TestNormalizeLarkTables_NoHeaderRow
- Improve docstring coverage for exported functions
Summary
docs +fetchoutputs markdown containing custom<lark-table>tags that are not renderable in any standard markdown viewer (Obsidian, GitHub, VS Code, etc.). This PR adds automatic conversion to GFM pipe tables.Changes
shortcuts/doc/lark_table.go: NewNormalizeLarkTables()function that converts<lark-table>/<lark-tr>/<lark-td>blocks to standard GFM pipe tablesshortcuts/doc/docs_fetch.go: CallsNormalizeLarkTables()on the markdown output during pretty-printshortcuts/doc/lark_table_test.go: Unit tests covering basic tables, no-header mode, code-block fallback, and pipe escapingBehavior
<lark-table><lark-tr><lark-td>Name</lark-td>...| Name | Age |Edge cases handled
header-row="false": Skips separator row<lark-table>(GFM pipe tables cannot contain multi-line code)\|<grid>/<column>/<text>tags: Flattened and strippedMotivation
When using
docs +fetchto export Lark documents for local use (e.g., in Obsidian vaults), tables are the most common broken element. This change makes the output immediately usable without post-processing.Related issue: #259
Summary by CodeRabbit
New Features
normalized, pipe characters escaped), and preserves original markup if fenced code blocks are present.
Tests