XML
Parsing
Parse XML strings into Element trees with configurable options
Basic Parsing
import { parse } from "@office-open/xml";
const root = parse(`<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:r><w:t>Hello</w:t></w:r>
</w:p>`);
// root.name === "w:p"
// root.elements[0].name === "w:r"
The parse function converts an XML string into an Element tree following the xml-js format.
parse(xml, options?)
const root = parse(xmlString, {
trim: true,
ignoreComment: true,
});
Options
| Option | Type | Default | Description |
|---|---|---|---|
trim | boolean | false | Trim whitespace in text nodes |
ignoreDeclaration | boolean | false | Skip XML declaration (<?xml ...?>) |
ignoreComment | boolean | false | Skip XML comments (<!-- -->) |
ignoreCdata | boolean | false | Skip CDATA sections |
ignoreDoctype | boolean | false | Skip DOCTYPE declarations |
ignoreText | boolean | false | Skip text nodes |
nativeTypeAttributes | boolean | false | Convert attribute values to native types |
Element Structure
Each parsed node is an Element object:
interface Element {
type: "element";
name: string;
attributes?: Record<string, string>;
elements?: Element[];
}
Text nodes have a different structure:
// Text element
{ type: "text", text: "Hello World" }
Namespace Handling
Namespaces are preserved as regular attributes:
const root = parse(`<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:r><w:t>Hello</w:t></w:r>
</w:p>`);
// root.attributes["xmlns:w"] === "http://schemas.openxmlformats.org/..."
All element names include their prefix, so you query using "w:p", "w:r", etc.
xml2js / xml2json
For xml-js compatibility, aliases are available:
import { xml2js, xml2json } from "@office-open/xml";
const element = xml2js(xmlString);
const jsonString = xml2json(xmlString);
Reading from ZIP Archives
Combine with @office-open/core to parse XML from OOXML files:
import { readFileSync } from "node:fs";
import { unzipToMap, readXmlFromZip } from "@office-open/core";
const zip = unzipToMap(readFileSync("document.docx"));
const document = readXmlFromZip(zip, "word/document.xml");
// document is already an Element tree