DOCX
Parsing
Parse, inspect, and round-trip .docx files
The library provides two levels of parsing API:
parseDocument— High-level round-trip API that returnsDocumentOptionsparseDocx— Low-level access to the raw XML parts of a.docxfile
Both functions accept any DataType input: Uint8Array, ArrayBuffer, DataView, number[], base64 string, and more.
Round-Trip (parseDocument)
The parseDocument function parses a .docx file into DocumentOptions that can be passed to generateDocument(), enabling full round-trip (export → parse → re-export):
import { parseDocument, generateDocument } from "@office-open/docx";
import { readFileSync } from "node:fs";
// Parse an existing .docx file
const opts = parseDocument(readFileSync("input.docx"));
// Modify sections if needed, then re-export
const buffer = await generateDocument(opts);
The returned options contain the same structure as the input format:
{
"title": "My Document",
"creator": "Author",
"sections": [
{
"children": [
{ "paragraph": "Hello World" },
{ "paragraph": { "children": [{ "text": "Bold", "bold": true }] } },
{ "table": { "rows": [...] } }
],
"properties": { "page": { "margin": { ... } } }
}
]
}
Parsed Fields
| Field | Source | Description |
|---|---|---|
sections | word/document.xml | Document sections with children |
title, creator, keywords, etc. | docProps/core.xml | Core properties |
view, zoom, defaultTabStop | word/settings.xml | Document settings |
evenAndOddHeaderAndFooters | word/settings.xml | Header/footer mode |
features.trackRevisions | word/settings.xml | Track changes |
features.updateFields | word/settings.xml | Auto-update fields |
compatabilityModeVersion | word/settings.xml | Compatibility mode |
background | word/settings.xml | Document background |
docVars | word/settings.xml | Document variables |
customProperties | docProps/custom.xml | Custom properties |
styles | word/styles.xml | Style definitions (paragraph, character, table, default) |
numbering | word/numbering.xml | Numbering/list definitions |
comments | word/comments.xml | Comment content (id, author, children) |
footnotes | word/footnotes.xml | Footnote content keyed by id |
endnotes | word/endnotes.xml | Endnote content keyed by id |
Supported Elements
The parser handles these paragraph children in round-trip:
| JSON Key | Type | Description |
|---|---|---|
commentRangeStart | number | Start of a comment range |
commentRangeEnd | number | End of a comment range |
commentReference | number | Reference to a comment definition |
footnoteReference | number | Footnote reference (styled run) |
endnoteReference | number | Endnote reference (styled run) |
pageBreak | true | Page break |
columnBreak | true | Column break |
symbolRun | object | Symbol run (Wingdings, etc.) |
math | object | Math equation (OMML) |
chart | object | Chart (column, bar, line, pie) |
smartArt | object | SmartArt diagram |
hyperlink | object | Hyperlink with link, anchor, tooltip, children |
bookmarkStart | object | Bookmark start with id, name |
bookmarkEnd | number | Bookmark end with id |
image | object | Inline image with type, data, transformation |
object | object | Embedded OLE object (w:object) |
Section-level elements (paragraphs, tables, TOC, SDT, textbox, altChunk) are all fully supported.
Low-Level Parsing (parseDocx)
The parseDocx function reads a .docx file and provides access to its raw XML parts:
import { parseDocx } from "@office-open/docx";
import { readFileSync } from "node:fs";
const doc = parseDocx(readFileSync("input.docx"));
// Access document body
console.log(doc.body);
// Access styles (if present)
console.log(doc.styles);
// Access numbering (if present)
console.log(doc.numbering);
// Access settings (if present)
console.log(doc.settings);
DocxDocument API
The returned DocxDocument object contains:
| Property | Type | Description |
|---|---|---|
doc | ParsedArchive | Full parsed archive (all parts) |
body | Element | Document body element (w:body) |
styles | Element | undefined | Styles element |
numbering | Element | undefined | Numbering definitions |
settings | Element | undefined | Document settings |
fontTable | Element | undefined | Font table |
partRefs | DocxPartRefs | References to headers, footers, notes, hyperlinks |
Accessing Parts
const doc = parseDocx(data);
// Get all parts by path
const body = doc.doc.get("word/document.xml");
const styles = doc.doc.get("word/styles.xml");
const header = doc.doc.get(doc.partRefs.headers.get("rId1"));
// Part references map relationship IDs to paths
for (const [rId, path] of doc.partRefs.headers) {
const header = doc.doc.get(path);
}
Working with XML Elements
The parsed elements use the @office-open/xml library's Element type:
import { attr } from "@office-open/xml";
// Access element attributes
const tagName = attr(doc.body, "tagName");
// Iterate child elements
for (const child of doc.body.elements ?? []) {
console.log(child.name);
}
Use Cases
- Round-trip — Parse a
.docx, modify sections, and re-export - Extract text — Walk the body elements to extract paragraph text
- Merge documents — Parse multiple
.docxfiles and combine their content - Inspect formatting — Read style and numbering definitions
- Transform — Modify parsed elements and rebuild a document