Skip to content

fix(doc): post-process docs +fetch output to improve round-trip fidelity#214

Open
herbertliu wants to merge 6 commits intomainfrom
feat/docs-export-import-fix
Open

fix(doc): post-process docs +fetch output to improve round-trip fidelity#214
herbertliu wants to merge 6 commits intomainfrom
feat/docs-export-import-fix

Conversation

@herbertliu
Copy link
Copy Markdown
Collaborator

@herbertliu herbertliu commented Apr 2, 2026

Summary

fetch-doc (MCP) exports Markdown with several spec violations — missing blank lines between blocks, trailing spaces in bold markers, blockquote lines collapsing, etc. — that cause ~80% of document structure to be lost on re-import via create-doc. This PR adds a fixExportedMarkdown() post-processing pipeline to docs +fetch that corrects these issues at the CLI layer.

Changes

  • shortcuts/doc/docs_fetch.go: Call fixExportedMarkdown() on the markdown field returned by fetch-doc before outputting.
  • shortcuts/doc/markdown_fix.go (new): Post-processing pipeline with five fix functions:
    • fixBoldSpacing: removes trailing spaces before **/* closing delimiters; strips redundant ** inside ATX headings
    • fixSetextAmbiguity: inserts a blank line before --- that immediately follows a non-blank line, preventing Setext H2 misparse
    • fixBlockquoteHardBreaks: inserts an empty > separator between consecutive blockquote lines so each becomes its own paragraph on re-import
    • fixTopLevelSoftbreaks: inserts blank lines between adjacent top-level content blocks, and between content lines inside <callout>, <quote-container>, and <lark-td> containers; code fences are left untouched
    • fixCalloutEmoji: replaces named emoji aliases (e.g. emoji="warning") with Unicode emoji that create-doc recognises
    • fixCodeBlockTrailingBlanks: removes spurious blank lines that create-doc/fetch-doc inserts before each closing ``` fence
  • shortcuts/doc/markdown_fix_test.go (new): Table-driven unit tests for all fix functions.

Test Plan

  • Unit tests pass (go test ./shortcuts/doc/...)
  • Round-trip verified on three documents of increasing complexity:
    • Harness Engineering (427 lines): diff 166 lines → 1 line (≈99.8% fidelity)

Related Issues

  • None

Summary by CodeRabbit

  • Bug Fixes

    • Improved markdown export processing with fixes for bold/italic spacing, blockquote line breaks, callout emoji conversion, and code block formatting to ensure proper re-import.
  • Tests

    • Added comprehensive test coverage for markdown transformation logic.
  • Chores

    • Updated ignore rules for temporary test files.

herbertliu and others added 6 commits April 2, 2026 15:06
Port fixExportedMarkdown() from internal cli to improve round-trip
fidelity when fetched Markdown is re-imported via +update / +create.

Five fixes applied to MCP fetch-doc raw output:
1. fixBoldSpacing: remove trailing spaces before ** / *, strip redundant
   ** from ATX headings (CommonMark compliance)
2. fixSetextAmbiguity: insert blank line before "---" following a
   paragraph to prevent Setext H2 misparse
3. fixBlockquoteHardBreaks: insert bare ">" between consecutive "> "
   lines so create-doc preserves each line as its own paragraph
4. fixTopLevelSoftbreaks: insert blank line between adjacent top-level
   content lines (Lark exports blocks with single \n; parsers collapse
   them into one paragraph without the blank line)
5. Same blank-line insertion inside <lark-td> cells for table content

Result: round-trip diff 166 lines → 1 line (image token regeneration,
unavoidable by design). Accuracy ≈ 99.8%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover all five fix functions with table-driven tests:
- fixBoldSpacing: trailing space removal and heading bold stripping
- fixSetextAmbiguity: blank line before --- to prevent Setext H2
- fixBlockquoteHardBreaks: bare > inserted between consecutive blockquote lines
- fixTopLevelSoftbreaks: blank line between top-level blocks, lark-td cells
- fixExportedMarkdown: end-to-end integration across all fixes

New code patch coverage for markdown_fix.go: 87-100% per function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ut emoji aliases

Three improvements to fixExportedMarkdown post-processing:

1. fixTopLevelSoftbreaks: promote <callout> and <quote-container> from
   opaque blocks to content containers, same as <lark-td>. Adjacent
   content lines inside these containers now get a blank line inserted
   between them so create-doc preserves paragraph structure instead of
   collapsing all lines into one.

2. fixCalloutEmoji: new function that replaces named emoji aliases
   (e.g. emoji="warning") with actual Unicode emoji characters
   (e.g. emoji="⚠️"). fetch-doc sometimes emits named aliases that
   create-doc does not recognize, causing it to fall back to an
   unrelated emoji.

3. Tests updated to reflect new callout/quote-container behaviour and
   cover fixCalloutEmoji with table-driven cases.

Round-trip diff on test doc (839 lines): 24 lines → 19 lines.
Remaining diff: whiteboard token loss (create-doc design limitation,
4 whiteboards × ~2 lines each) + 1 emoji mapped by create-doc server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tracking

fetch-doc / create-doc adds a spurious blank line before each closing
code fence, e.g.:

  ```
  last line
             <- blank added by server
  ```

Add fixCodeBlockTrailingBlanks() to remove these blanks after fetch.

Also fix fixTopLevelSoftbreaks() to use the same simple toggle rule:
any ```lang line opens a code block; only a plain ``` closes it.
Previously the function toggled on every ``` line, causing lines inside
a nested fence (e.g. ```go inside ```markdown) to be treated as
top-level content and receive spurious blank-line insertions.

Round-trip diff for the Lark CLI dev spec: 200 lines → 156 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels Apr 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

A markdown post-processing module is added to normalize Lark-exported content through sequential transformations: spacing fixes, setext disambiguation, blockquote handling, softbreak insertion, emoji conversion, and code block cleanup. The docs fetch handler now applies these transformations to markdown results.

Changes

Cohort / File(s) Summary
Git Configuration
.gitignore
Added ignore rule for temporary test files in /.tmp/ directory.
Markdown Post-processing Logic
shortcuts/doc/markdown_fix.go
New module with fixExportedMarkdown() orchestrator and helper functions (fixBoldSpacing, fixSetextAmbiguity, fixBlockquoteHardBreaks, fixTopLevelSoftbreaks, fixCalloutEmoji, fixCodeBlockTrailingBlanks) to sequentially transform Lark-exported Markdown for normalization.
Integration & Testing
shortcuts/doc/docs_fetch.go, shortcuts/doc/markdown_fix_test.go
Conditional post-processing of markdown field in MCP tool results via fixExportedMarkdown(); comprehensive unit and end-to-end test coverage validating all transformation behaviors including edge cases for blockquotes, code fences, content containers, and callout emoji replacement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, hop! Through markdown we now go,
Setext lines and softbreaks flow,
Bold strays fixed with gentle care,
Exported Lark, now formatted fair!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding post-processing to improve round-trip fidelity of exported markdown in the docs fetch command.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering all required template sections with detailed explanations of changes and test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/docs-export-import-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
shortcuts/doc/markdown_fix.go (1)

148-167: Duplicate doc comment block should be removed.

The fixCodeBlockTrailingBlanks comment is duplicated verbatim, which adds noise and risks drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 148 - 167, The file contains a
duplicated doc comment for fixCodeBlockTrailingBlanks; remove the redundant copy
so only one explanatory comment remains immediately above the
fixCodeBlockTrailingBlanks declaration, keeping the original wording, formatting
and examples intact and ensuring surrounding blank lines/paragraph breaks stay
consistent with Go doc comment style for that function.
shortcuts/doc/markdown_fix_test.go (1)

90-125: Add fenced-code regression cases for early fixers.

Please add explicit tests asserting that fixBoldSpacing, fixSetextAmbiguity, and fixBlockquoteHardBreaks do not modify content inside fenced code blocks.

Example tests to add
+func TestFixersDoNotModifyInsideFences(t *testing.T) {
+	input := "```md\n**x **\n> a\n> b\nline\n---\n```"
+
+	if got := fixBoldSpacing(input); got != input {
+		t.Fatalf("fixBoldSpacing modified fenced code: %q", got)
+	}
+	if got := fixSetextAmbiguity(input); got != input {
+		t.Fatalf("fixSetextAmbiguity modified fenced code: %q", got)
+	}
+	if got := fixBlockquoteHardBreaks(input); got != input {
+		t.Fatalf("fixBlockquoteHardBreaks modified fenced code: %q", got)
+	}
+}

Also applies to: 127-173, 175-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix_test.go` around lines 90 - 125, Add regression
tests in shortcuts/doc/markdown_fix_test.go to ensure the early-fixer functions
do not alter fenced code blocks: create inputs containing a fenced code block
(triple backticks) with markdown that would otherwise trigger fixes (e.g., bold
spacing, setext-like lines, blockquote lines) and call fixBoldSpacing,
fixSetextAmbiguity, and fixBlockquoteHardBreaks with those inputs, asserting
each returns the original input unchanged; reference the existing test structure
and functions fixBoldSpacing, fixSetextAmbiguity, and fixBlockquoteHardBreaks
when adding these assertions so they run alongside the other TestFix... cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 30-33: The three transforms (fixBoldSpacing, fixSetextAmbiguity,
fixBlockquoteHardBreaks) currently run over entire markdown and mutate content
inside fenced code blocks; update each function so it scans lines, tracks a
boolean inCode toggled on/off when encountering fenced-code delimiters (e.g. a
trimmed line == "```" or other fence variants), and only apply the transform
logic to chunks outside fenced blocks while appending fenced-block lines
verbatim (use a flush mechanism to process accumulated non-code chunk with
existing transform and then clear it); ensure the fence detection correctly
toggles inCode and preserves fence lines and inner code unchanged to maintain
round-trip fidelity.

---

Nitpick comments:
In `@shortcuts/doc/markdown_fix_test.go`:
- Around line 90-125: Add regression tests in shortcuts/doc/markdown_fix_test.go
to ensure the early-fixer functions do not alter fenced code blocks: create
inputs containing a fenced code block (triple backticks) with markdown that
would otherwise trigger fixes (e.g., bold spacing, setext-like lines, blockquote
lines) and call fixBoldSpacing, fixSetextAmbiguity, and fixBlockquoteHardBreaks
with those inputs, asserting each returns the original input unchanged;
reference the existing test structure and functions fixBoldSpacing,
fixSetextAmbiguity, and fixBlockquoteHardBreaks when adding these assertions so
they run alongside the other TestFix... cases.

In `@shortcuts/doc/markdown_fix.go`:
- Around line 148-167: The file contains a duplicated doc comment for
fixCodeBlockTrailingBlanks; remove the redundant copy so only one explanatory
comment remains immediately above the fixCodeBlockTrailingBlanks declaration,
keeping the original wording, formatting and examples intact and ensuring
surrounding blank lines/paragraph breaks stay consistent with Go doc comment
style for that function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90a5d8be-dcb4-4422-9fbc-1e021a9cab75

📥 Commits

Reviewing files that changed from the base of the PR and between 0f96bdf and 57b0303.

📒 Files selected for processing (4)
  • .gitignore
  • shortcuts/doc/docs_fetch.go
  • shortcuts/doc/markdown_fix.go
  • shortcuts/doc/markdown_fix_test.go

Comment on lines +30 to +33
md = fixBoldSpacing(md)
md = fixSetextAmbiguity(md)
md = fixBlockquoteHardBreaks(md)
md = fixTopLevelSoftbreaks(md)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: fenced code content can be corrupted by early transforms.

fixBoldSpacing, fixSetextAmbiguity, and fixBlockquoteHardBreaks run on full markdown and currently modify text inside fenced code blocks. That changes literal code content (e.g. > a\n> b, **x **, line\n---) and breaks round-trip fidelity.

Suggested fix (apply these transforms only outside fenced blocks)
 func fixExportedMarkdown(md string) string {
-	md = fixBoldSpacing(md)
-	md = fixSetextAmbiguity(md)
-	md = fixBlockquoteHardBreaks(md)
+	md = applyOutsideCodeFences(md, fixBoldSpacing)
+	md = applyOutsideCodeFences(md, fixSetextAmbiguity)
+	md = applyOutsideCodeFences(md, fixBlockquoteHardBreaks)
 	md = fixTopLevelSoftbreaks(md)
 	md = fixCalloutEmoji(md)
 	md = fixCodeBlockTrailingBlanks(md)
@@
 	return md
 }
+
+func applyOutsideCodeFences(md string, fn func(string) string) string {
+	lines := strings.Split(md, "\n")
+	var out []string
+	var chunk []string
+	inCode := false
+
+	flush := func() {
+		if len(chunk) == 0 {
+			return
+		}
+		out = append(out, strings.Split(fn(strings.Join(chunk, "\n")), "\n")...)
+		chunk = chunk[:0]
+	}
+
+	for _, line := range lines {
+		trimmed := strings.TrimSpace(line)
+		if strings.HasPrefix(trimmed, "```") {
+			if !inCode {
+				flush()
+				inCode = true
+			} else if trimmed == "```" {
+				inCode = false
+			}
+			out = append(out, line)
+			continue
+		}
+		if inCode {
+			out = append(out, line)
+		} else {
+			chunk = append(chunk, line)
+		}
+	}
+	flush()
+	return strings.Join(out, "\n")
+}

Also applies to: 50-60, 77-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 30 - 33, The three transforms
(fixBoldSpacing, fixSetextAmbiguity, fixBlockquoteHardBreaks) currently run over
entire markdown and mutate content inside fenced code blocks; update each
function so it scans lines, tracks a boolean inCode toggled on/off when
encountering fenced-code delimiters (e.g. a trimmed line == "```" or other fence
variants), and only apply the transform logic to chunks outside fenced blocks
while appending fenced-block lines verbatim (use a flush mechanism to process
accumulated non-code chunk with existing transform and then clear it); ensure
the fence detection correctly toggles inCode and preserves fence lines and inner
code unchanged to maintain round-trip fidelity.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@57b030392205285cb255d8aaa4af7800064508da

🧩 Skill update

npx skills add larksuite/cli#feat/docs-export-import-fix -y -g

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR adds a fixExportedMarkdown() post-processing pipeline to docs +fetch that corrects several CommonMark spec violations (missing blank lines, trailing spaces in bold markers, Setext-heading misparses, blockquote collapsing, spurious code-fence blanks, named emoji aliases) that cause significant document-structure loss on round-trip through create-doc. The implementation is well-structured with table-driven unit tests, and the approach is sound for the stated Lark-export patterns.

  • The docstring for fixCodeBlockTrailingBlanks is duplicated verbatim (the same block comment appears twice, lines 148–157 and 159–168).
  • The fixExportedMarkdown top-level comment enumerates only five fix functions (stopping at fixCalloutEmoji) but the function body also calls a sixth, fixCodeBlockTrailingBlanks.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/doc issues; no runtime defects on the changed path.

All four comments are P2 (duplicate docstring, missing entry in top-level comment, an edge-case regex behaviour unlikely to be triggered by Lark exports, and a missing blank between adjacent container tags). None affect correctness for the documented use cases, and the round-trip improvement is verified on real documents. No P0/P1 issues found.

shortcuts/doc/markdown_fix.go — duplicate docstring and headingBoldRe edge case worth a follow-up test.

Important Files Changed

Filename Overview
shortcuts/doc/markdown_fix.go New post-processing pipeline with 6 fix functions; one duplicate docstring block and one edge-case regex issue with headingBoldRe on headings with multiple disjoint bold spans.
shortcuts/doc/markdown_fix_test.go Comprehensive table-driven unit tests covering all six fix functions; the headingBoldRe multi-bold-span edge case is not tested.
shortcuts/doc/docs_fetch.go Minimal, clean integration: applies fixExportedMarkdown() to the markdown field before output; no issues.
.gitignore Adds .tmp/ to gitignore for temporary test files; straightforward.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[fetch-doc MCP response] --> B{markdown field present?}
    B -- yes --> C[fixExportedMarkdown]
    B -- no --> G[OutFormat unchanged]
    C --> C1[fixBoldSpacing\nremove trailing spaces in ** / *\nstrip redundant ** in ATX headings]
    C1 --> C2[fixSetextAmbiguity\ninsert blank before bare --- lines]
    C2 --> C3[fixBlockquoteHardBreaks\ninsert blank > between blockquote lines]
    C3 --> C4[fixTopLevelSoftbreaks\ninsert blank between top-level blocks\nand inside callout / quote-container / lark-td]
    C4 --> C5[fixCalloutEmoji\nreplace named aliases with Unicode emoji]
    C5 --> C6[fixCodeBlockTrailingBlanks\nremove spurious blank before closing fence]
    C6 --> C7[collapse 3+ newlines to 2\ntrim trailing newlines]
    C7 --> G[OutFormat\nwrite title + markdown to output]
Loading

Reviews (1): Last reviewed commit: "chore: ignore .tmp/ test directory" | Re-trigger Greptile

Comment on lines +148 to +168
// fixCodeBlockTrailingBlanks removes blank lines that appear immediately before
// a closing ``` fence inside a fenced code block. fetch-doc / create-doc
// sometimes inserts an extra blank line at the end of code block content:
//
// ```
// last line
// ← spurious blank added by server
// ```
//
// Removing it keeps the round-trip diff clean.
// fixCodeBlockTrailingBlanks removes blank lines that appear immediately before
// a closing ``` fence inside a fenced code block. fetch-doc / create-doc
// sometimes inserts an extra blank line at the end of code block content:
//
// ```
// last line
// ← spurious blank added by server
// ```
//
// Removing it keeps the round-trip diff clean.
// Nested fences (e.g. ```go inside ```markdown) are handled by tracking
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated function docstring

The doc comment for fixCodeBlockTrailingBlanks appears verbatim twice: once at lines 148–157 and again at lines 159–168 (only the final sentence about nested fences is new). The first block should be removed, leaving only the deduplicated comment.

Comment on lines +11 to +28
// fixExportedMarkdown applies post-processing to Lark-exported Markdown to
// improve round-trip fidelity on re-import:
//
// 1. fixBoldSpacing: removes trailing whitespace before closing ** / *,
// and strips redundant ** from ATX headings.
//
// 2. fixSetextAmbiguity: inserts a blank line before any "---" that immediately
// follows a non-empty line, preventing it from being parsed as a Setext H2.
//
// 3. fixBlockquoteHardBreaks: inserts a blank blockquote line (">") between
// consecutive blockquote content lines so create-doc preserves line breaks.
//
// 4. fixTopLevelSoftbreaks: inserts a blank line between adjacent non-empty
// lines at the top level and inside content containers (callout,
// quote-container, lark-td). Code fences are left untouched.
//
// 5. fixCalloutEmoji: replaces named emoji aliases (e.g. emoji="warning") with
// actual Unicode emoji characters that create-doc understands.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Top-level docstring omits fixCodeBlockTrailingBlanks

The numbered list stops at "5. fixCalloutEmoji", but fixExportedMarkdown also calls fixCodeBlockTrailingBlanks (line 35). The comment should include a sixth entry so the doc accurately reflects all six passes applied.

var (
boldTrailingSpaceRe = regexp.MustCompile(`(\*\*\S[^*]*?)\s+(\*\*)`)
italicTrailingSpaceRe = regexp.MustCompile(`(\*\S[^*]*?)\s+(\*)`)
headingBoldRe = regexp.MustCompile(`(?m)^(#{1,6})\s+\*\*(.+?)\*\*\s*$`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 headingBoldRe can mangle headings with multiple disjoint bold spans

headingBoldRe uses a non-greedy .+?, but Go's RE2 engine still finds the leftmost-longest overall match. For a heading like # **foo** and **bar**, the engine anchors at # **, then the non-greedy (.+?) expands until \*\*\s*$ can match — which only succeeds when (.+?) captures foo** and **bar (the final ** in the line acting as the closing delimiter). The replacement then emits # foo** and **bar, leaving stray ** markers that render as literal text.

In practice Lark's fetch-doc wraps entire heading content in one bold pair, so this edge case is unlikely to be triggered. It may be worth adding a guard or a test to make this explicit.

// Reproducer (currently handled incorrectly):
// "# **foo** and **bar**" → "# foo** and **bar"

Comment on lines +286 to +294

// Container opening/closing tags are structural — skip them.
isContainerTag := false
for _, cc := range contentContainers {
if strings.HasPrefix(trimmed, cc[0]) || strings.HasPrefix(trimmed, "</"+cc[0][1:]) {
isContainerTag = true
break
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Adjacent top-level container blocks get no separating blank line

isContainerTag is true for both <callout> opening and </callout> closing tags, so the blank-line insertion is skipped for those lines. As a result, two consecutive callout (or <quote-container>) blocks at the top level will have no blank line between the </callout> and the following <callout>:

</callout>
<callout emoji="⚠️">   ← no blank line inserted here
...

This may not matter if the downstream importer tolerates the missing separator, but it is an inconsistency with the stated goal of inserting blank lines between adjacent top-level blocks.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

/log/

# Temporary test files
.tmp/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ".gitignore" change necessary for this PR? If ".tmp/" is not produced by these changes, I would prefer to leave it out to keep the PR focused.

// Removing it keeps the round-trip diff clean.
// Nested fences (e.g. ```go inside ```markdown) are handled by tracking
// nesting depth so only the outermost closing fence is affected.
func fixCodeBlockTrailingBlanks(md string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass looks inherently lossy: it removes any blank line immediately before a closing fence, but an empty trailing line can be real code content. That means some code blocks can no longer round-trip exactly.

// Structural table tags (<lark-table>, <lark-tr>, <lark-td> and their closing
// counterparts) never trigger blank-line insertion themselves. Fenced code
// blocks (``` ... ```) are left completely untouched.
func fixTopLevelSoftbreaks(md string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic inserts blank lines between any adjacent non-empty top-level lines, so it also rewrites list syntax. - a\n- b becomes a loose list, and - a\n continuation becomes a second paragraph inside the item. That seems risky for round-trip fidelity. I’d suggest either narrowing the heuristic or adding explicit list/list-continuation tests before merging.

headingBoldRe = regexp.MustCompile(`(?m)^(#{1,6})\s+\*\*(.+?)\*\*\s*$`)
)

func fixBoldSpacing(md string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still rewrites inline code spans. Example: `**hello **` becomes `**hello**`. Since the PR goal is round-trip fidelity, should these transforms also skip inline code, not just fenced code?

//
// 5. fixCalloutEmoji: replaces named emoji aliases (e.g. emoji="warning") with
// actual Unicode emoji characters that create-doc understands.
func fixExportedMarkdown(md string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we make fixBoldSpacing / fixSetextAmbiguity / fixBlockquoteHardBreaks fence-aware, fenced code is still not safe here: fixCalloutEmoji and the final \n\n\n collapse both run globally and can rewrite literal code samples. For example, <callout emoji="warning"> inside a fenced block will be changed, and intentional multiple blank lines in code will be collapsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants