DOCX

Parsing

Parse, inspect, and round-trip .docx files

The library provides two levels of parsing API:

  • parseDocument — High-level round-trip API that returns DocumentOptions
  • parseDocx — Low-level access to the raw XML parts of a .docx file

Both functions accept any DataType input: Uint8Array, ArrayBuffer, DataView, number[], base64 string, and more.

Round-Trip (parseDocument)

The parseDocument function parses a .docx file into DocumentOptions that can be passed to generateDocument(), enabling full round-trip (export → parse → re-export):

import { parseDocument, generateDocument } from "@office-open/docx";
import { readFileSync } from "node:fs";

// Parse an existing .docx file
const opts = parseDocument(readFileSync("input.docx"));

// Modify sections if needed, then re-export
const buffer = await generateDocument(opts);

The returned options contain the same structure as the input format:

{
  "title": "My Document",
  "creator": "Author",
  "sections": [
    {
      "children": [
        { "paragraph": "Hello World" },
        { "paragraph": { "children": [{ "text": "Bold", "bold": true }] } },
        { "table": { "rows": [...] } }
      ],
      "properties": { "page": { "margin": { ... } } }
    }
  ]
}

Parsed Fields

FieldSourceDescription
sectionsword/document.xmlDocument sections with children
title, creator, keywords, etc.docProps/core.xmlCore properties
view, zoom, defaultTabStopword/settings.xmlDocument settings
evenAndOddHeaderAndFootersword/settings.xmlHeader/footer mode
features.trackRevisionsword/settings.xmlTrack changes
features.updateFieldsword/settings.xmlAuto-update fields
compatabilityModeVersionword/settings.xmlCompatibility mode
backgroundword/settings.xmlDocument background
docVarsword/settings.xmlDocument variables
customPropertiesdocProps/custom.xmlCustom properties
stylesword/styles.xmlStyle definitions (paragraph, character, table, default)
numberingword/numbering.xmlNumbering/list definitions
commentsword/comments.xmlComment content (id, author, children)
footnotesword/footnotes.xmlFootnote content keyed by id
endnotesword/endnotes.xmlEndnote content keyed by id

Supported Elements

The parser handles these paragraph children in round-trip:

JSON KeyTypeDescription
commentRangeStartnumberStart of a comment range
commentRangeEndnumberEnd of a comment range
commentReferencenumberReference to a comment definition
footnoteReferencenumberFootnote reference (styled run)
endnoteReferencenumberEndnote reference (styled run)
pageBreaktruePage break
columnBreaktrueColumn break
symbolRunobjectSymbol run (Wingdings, etc.)
mathobjectMath equation (OMML)
chartobjectChart (column, bar, line, pie)
smartArtobjectSmartArt diagram
hyperlinkobjectHyperlink with link, anchor, tooltip, children
bookmarkStartobjectBookmark start with id, name
bookmarkEndnumberBookmark end with id
imageobjectInline image with type, data, transformation
objectobjectEmbedded OLE object (w:object)

Section-level elements (paragraphs, tables, TOC, SDT, textbox, altChunk) are all fully supported.

Low-Level Parsing (parseDocx)

The parseDocx function reads a .docx file and provides access to its raw XML parts:

import { parseDocx } from "@office-open/docx";
import { readFileSync } from "node:fs";

const doc = parseDocx(readFileSync("input.docx"));

// Access document body
console.log(doc.body);

// Access styles (if present)
console.log(doc.styles);

// Access numbering (if present)
console.log(doc.numbering);

// Access settings (if present)
console.log(doc.settings);

DocxDocument API

The returned DocxDocument object contains:

PropertyTypeDescription
docParsedArchiveFull parsed archive (all parts)
bodyElementDocument body element (w:body)
stylesElement | undefinedStyles element
numberingElement | undefinedNumbering definitions
settingsElement | undefinedDocument settings
fontTableElement | undefinedFont table
partRefsDocxPartRefsReferences to headers, footers, notes, hyperlinks

Accessing Parts

const doc = parseDocx(data);

// Get all parts by path
const body = doc.doc.get("word/document.xml");
const styles = doc.doc.get("word/styles.xml");
const header = doc.doc.get(doc.partRefs.headers.get("rId1"));

// Part references map relationship IDs to paths
for (const [rId, path] of doc.partRefs.headers) {
  const header = doc.doc.get(path);
}

Working with XML Elements

The parsed elements use the @office-open/xml library's Element type:

import { attr } from "@office-open/xml";

// Access element attributes
const tagName = attr(doc.body, "tagName");

// Iterate child elements
for (const child of doc.body.elements ?? []) {
  console.log(child.name);
}

Use Cases

  • Round-trip — Parse a .docx, modify sections, and re-export
  • Extract text — Walk the body elements to extract paragraph text
  • Merge documents — Parse multiple .docx files and combine their content
  • Inspect formatting — Read style and numbering definitions
  • Transform — Modify parsed elements and rebuild a document
Copyright © 2026