DOCX
Parsing
Parse and inspect existing .docx files with parseDocx
The parseDocx function reads an existing .docx file and provides access to its document parts.
Basic Usage
import { parseDocx } from "@office-open/docx";
import { readFileSync } from "node:fs";
const data = new Uint8Array(readFileSync("input.docx"));
const doc = parseDocx(data);
// Access document body
console.log(doc.body);
// Access styles (if present)
console.log(doc.styles);
// Access numbering (if present)
console.log(doc.numbering);
// Access settings (if present)
console.log(doc.settings);
DocxDocument API
The returned DocxDocument object contains:
| Property | Type | Description |
|---|---|---|
doc | ParsedDocument | Full parsed document (all parts) |
body | Element | Document body element (w:body) |
styles | Element | undefined | Styles element |
numbering | Element | undefined | Numbering definitions |
settings | Element | undefined | Document settings |
fontTable | Element | undefined | Font table |
partRefs | DocxPartRefs | References to headers, footers, notes |
Accessing Parts
const doc = parseDocx(data);
// Get all parts by path
const body = doc.doc.get("word/document.xml");
const styles = doc.doc.get("word/styles.xml");
const header = doc.doc.get(doc.partRefs.headers.get("rId1"));
// Part references map relationship IDs to paths
for (const [rId, path] of doc.partRefs.headers) {
const header = doc.doc.get(path);
}
Working with XML Elements
The parsed elements use the @office-open/xml library's Element type:
import { attr } from "@office-open/xml";
// Access element attributes
const tagName = attr(doc.body, "tagName");
// Iterate child elements
for (const child of doc.body.elements ?? []) {
console.log(child.name);
}
Use Cases
- Extract text — Walk the body elements to extract paragraph text
- Merge documents — Parse multiple
.docxfiles and combine their content - Inspect formatting — Read style and numbering definitions
- Transform — Modify parsed elements and rebuild a document