XML

解析

将 XML 字符串解析为 Element 树，支持可配置选项

基本解析

import { xml2js } from "@office-open/xml";

const root = xml2js(`<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:r><w:t>Hello</w:t></w:r>
</w:p>`);

// root.name === "w:p"
// root.elements[0].name === "w:r"

xml2js 函数将 XML 字符串转换为遵循 xml-js 格式的 Element 树。

xml2js(xml, options?)

const root = xml2js(xmlString, {
  trim: true,
  ignoreComment: true,
});

选项

选项	类型	默认值	说明
`trim`	`boolean`	`false`	去除文本节点中的空白
`ignoreDeclaration`	`boolean`	`false`	跳过 XML 声明（`<?xml ...?>`)
`ignoreComment`	`boolean`	`false`	跳过 XML 注释（`<!-- -->`）
`ignoreCdata`	`boolean`	`false`	跳过 CDATA 部分
`ignoreDoctype`	`boolean`	`false`	跳过 DOCTYPE 声明
`ignoreText`	`boolean`	`false`	跳过文本节点
`nativeTypeAttributes`	`boolean`	`false`	将属性值转换为原生类型

Element 结构

每个解析后的节点是一个 Element 对象——完整接口见。元素通常使用 name、attributes 和 elements：

interface Element {
  type?: string;
  name?: string;
  attributes?: Attributes;
  elements?: Element[];
  text?: string | number | boolean;
  // ... 以及 cdata、comment、declaration 等
}

文本内容存于元素的 text 字段，而非独立节点：

// 解析 <w:t>Hello World</w:t>
{ type: "element", name: "w:t", text: "Hello World" }

命名空间处理

命名空间作为普通属性保留：

const root = xml2js(`<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:r><w:t>Hello</w:t></w:r>
</w:p>`);

// root.attributes["xmlns:w"] === "http://schemas.openxmlformats.org/..."

所有元素名称包含前缀，因此查询时使用 "w:p"、"w:r" 等。

xml2js / xml2json

为了 xml-js 兼容性，提供了别名：

import { xml2js, xml2json } from "@office-open/xml";

const element = xml2js(xmlString);
const jsonString = xml2json(xmlString);

从 ZIP 归档中读取

结合 @office-open/core 从 OOXML 文件中解析 XML：

import { readFileSync } from "node:fs";
import { parseArchive } from "@office-open/core";

const archive = parseArchive(readFileSync("document.docx"));
const document = archive.get("word/document.xml");
// document 已经是 Element 树

编辑此页面或提交问题报告

@office-open/xml

零依赖 XML 解析和序列化 — xml + xml-js 的直接替代品

序列化

将 Element 树转换回 XML 字符串，支持格式化选项