CORE
Archive
读写 OOXML ZIP 归档,解析关系文件
OOXML 文件(.docx、.pptx)是包含 XML 部件的 ZIP 归档。archive 模块提供了读写这些归档的底层工具。
读取归档
unzipToMap
将 OOXML 文件解压为 Map<string, Uint8Array>:
import { readFileSync } from "node:fs";
import { unzipToMap } from "@office-open/core";
const zip = unzipToMap(readFileSync("document.docx"));
读取函数
import { readTextFromZip, readXmlFromZip, readBinaryFromZip } from "@office-open/core";
// 读取文本内容
const contentTypes = readTextFromZip(zip, "[Content_Types].xml");
// 读取并解析 XML
const documentXml = readXmlFromZip(zip, "word/document.xml");
// 读取二进制数据(图片等)
const imageData = readBinaryFromZip(zip, "word/media/image1.png");
readAllXmlParts
解析归档中的所有 XML 部件,跳过二进制文件:
import { readAllXmlParts } from "@office-open/core";
const parts = readAllXmlParts(zip);
// { "[Content_Types].xml": Element, "word/document.xml": Element, ... }
listFiles
按路径前缀列出文件:
import { listFiles } from "@office-open/core";
const mediaFiles = listFiles(zip, "word/media/");
// ["word/media/image1.png", "word/media/image2.jpg"]
写入归档
zipToBuffer
从文件映射创建 ZIP 缓冲区:
import { zipToBuffer } from "@office-open/core";
const files = new Map<string, Uint8Array | string>();
files.set("word/document.xml", xmlString);
files.set("word/media/image.png", imageBuffer);
const zipBuffer = zipToBuffer(files);
关系文件
OOXML 使用 .rels 文件定义部件之间的关系。
parseRels
解析关系文件:
import { parseRels } from "@office-open/core";
const rels = parseRels(zip, "word/_rels/document.xml.rels");
// [{ id: "rId1", target: "styles.xml", type: "...", targetMode: "External" }, ...]
Relationship 接口
interface Relationship {
id: string;
target: string;
type: string;
targetMode?: string;
}
完整示例
import { readFileSync, writeFileSync } from "node:fs";
import { unzipToMap, zipToBuffer, readXmlFromZip, listFiles } from "@office-open/core";
// 读取归档
const zip = unzipToMap(readFileSync("input.docx"));
// 列出所有部件
const allFiles = listFiles(zip, "");
console.log("文件:", allFiles);
// 读取 XML 部件
const document = readXmlFromZip(zip, "word/document.xml");
// 修改映射
zip.delete("word/settings.xml");
// 写入修改后的归档
writeFileSync("output.docx", zipToBuffer(zip));
工具函数
| 函数 | 说明 |
|---|---|
uint8ToBase64(data) | 将 Uint8Array 转换为 base64 字符串 |
getImageType(fileName) | 根据文件扩展名判断图片类型 |
elementToXml(el) | 将 Element 序列化为 XML 字符串 |