Description
The GetDocumentMetadata() function currently only processes section properties (w:sectPr) at the document body's top level. Section breaks that occur inside tables or text boxes are not detected.
Current Behavior
The CollectSectionData method in WmlToHtmlConverter.cs (lines 850-910) only iterates over top-level elements:
var blockElements = body.Elements()
.Where(e => e.Name == W.p || e.Name == W.tbl || e.Name == W.sectPr)
.ToList();
This means:
- ✅
sectPr inside paragraph properties (w:p/w:pPr/w:sectPr) is handled
- ✅ Document-level
sectPr at end of body (w:body/w:sectPr) is handled
- ❌
sectPr inside tables (w:tbl/.../w:sectPr) is NOT detected
- ❌
sectPr inside text boxes/shapes is NOT detected
Expected Behavior
The metadata extraction should scan the entire document tree for sectPr elements, not just top-level ones.
Impact
- Documents with section breaks inside tables may report incorrect section counts
- Paragraph/table indices per section may be inaccurate for complex documents
- This is an edge case - most documents don't have section breaks inside tables
Suggested Implementation
- Use
body.Descendants(W.sectPr) or a recursive scan to find all sectPr elements
- Determine the proper ordering of sections based on document position
- Associate content (paragraphs/tables) with their containing sections
Related
Labels
enhancement, metadata-api
Description
The
GetDocumentMetadata()function currently only processes section properties (w:sectPr) at the document body's top level. Section breaks that occur inside tables or text boxes are not detected.Current Behavior
The
CollectSectionDatamethod inWmlToHtmlConverter.cs(lines 850-910) only iterates over top-level elements:This means:
sectPrinside paragraph properties (w:p/w:pPr/w:sectPr) is handledsectPrat end of body (w:body/w:sectPr) is handledsectPrinside tables (w:tbl/.../w:sectPr) is NOT detectedsectPrinside text boxes/shapes is NOT detectedExpected Behavior
The metadata extraction should scan the entire document tree for
sectPrelements, not just top-level ones.Impact
Suggested Implementation
body.Descendants(W.sectPr)or a recursive scan to find all sectPr elementsRelated
Labels
enhancement, metadata-api