CORE
Parser
Parse and modify existing OOXML documents with ParsedDocument
The parser reads .docx and .pptx files into a ParsedDocument that provides a key-value store interface for accessing and modifying XML parts.
Parse a Document
import { readFileSync } from "node:fs";
import { parseDocument } from "@office-open/core";
const data = readFileSync("document.docx");
const doc = parseDocument(data);
ParsedDocument API
Read Parts
// Read an XML part as an Element tree
const styles = doc.get("word/styles.xml");
// Read binary data (images, media)
const image = doc.getRaw("word/media/image1.png");
// Check if a part exists
doc.has("word/document.xml"); // true
// List all paths
doc.keys(); // ["[Content_Types].xml", "word/document.xml", ...]
doc.keys("word/media/"); // ["word/media/image1.png"]
Write Parts
import { Element } from "@office-open/xml";
// Write an XML part
doc.set("word/styles.xml", modifiedStylesElement);
// Write binary data
doc.setRaw("word/media/image2.png", imageBuffer);
Remove Parts
doc.remove("word/settings.xml"); // true if removed
Save
// Serialize back to ZIP buffer
const modified = doc.save();
writeFileSync("modified.docx", modified);
Complete Example
import { readFileSync, writeFileSync } from "node:fs";
import { parseDocument } from "@office-open/core";
import { findChild, children } from "@office-open/xml";
const doc = parseDocument(readFileSync("input.docx"));
// Read the main document body
const documentXml = doc.get("word/document.xml");
// List all media files
const mediaFiles = doc.keys("word/media/");
console.log("Media files:", mediaFiles);
// Save a copy
writeFileSync("output.docx", doc.save());
Use with @office-open/xml
Combine ParsedDocument with @office-open/xml query utilities to inspect and modify XML content:
import { parseDocument } from "@office-open/core";
import { findChild, children, childText } from "@office-open/xml";
const doc = parseDocument(readFileSync("input.docx"));
const body = doc.get("word/document.xml");
// Find all paragraphs
const paragraphs = children(body, "w:p");
// Get text from first paragraph
const text = childText(paragraphs[0], "w:t");