reader

package
v2.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 28, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package reader provides low-level helpers for loading DOCX archives into raw OOXML parts that can later be mapped to domain models.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ReconstructDocument

func ReconstructDocument(parsed *ParsedPackage) (domain.Document, error)

ReconstructDocument converts a ParsedPackage into a domain.Document. This performs a minimal hydration pass that focuses on paragraph content and spacing so consumers can round-trip spacing metadata.

Types

type Element

type Element struct {
	Name     xml.Name
	Attr     []xml.Attr
	Text     string
	Children []*Element
}

Element represents a generic XML element with nested children.

type MediaPart

type MediaPart struct {
	Path        string
	Name        string
	ContentType string
	Data        []byte
}

MediaPart represents a binary asset bundled inside the DOCX archive.

type Package

type Package struct {
	// ContentTypes mirrors [Content_Types].xml.
	ContentTypes *xmlstructs.ContentTypes

	// RawParts keeps every part in the archive keyed by its canonical name.
	RawParts map[string][]byte

	// Core Word parts
	MainDocument          []byte
	DocumentRelationships []byte
	RootRelationships     []byte
	Styles                []byte
	Numbering             []byte
	FontTable             []byte
	Settings              []byte
	WebSettings           []byte
	ThemeParts            map[string][]byte
	CoreProperties        []byte
	AppProperties         []byte
	CustomProperties      []byte

	// Header/Footer content indexed by file name (e.g. "word/header1.xml").
	Headers map[string][]byte
	Footers map[string][]byte

	// Media assets keyed by archive path (e.g. "word/media/image1.png").
	Media map[string]*MediaPart

	// AdditionalParts captures any payload we do not process yet.
	AdditionalParts map[string][]byte

	// PackageSize is the total size of the original DOCX archive in bytes.
	PackageSize int64
	// contains filtered or unexported fields
}

Package represents the low-level parts that make up a DOCX archive. It focuses on raw OOXML payloads so higher layers can hydrate domain models without worrying about ZIP details.

func LoadPackage

func LoadPackage(r io.ReaderAt, size int64) (*Package, error)

LoadPackage loads a DOCX archive from an io.ReaderAt / size pair.

func LoadPackageFromBytes

func LoadPackageFromBytes(data []byte) (*Package, error)

LoadPackageFromBytes reads a DOCX archive from an in-memory byte slice.

func LoadPackageFromPath

func LoadPackageFromPath(path string) (*Package, error)

LoadPackageFromPath reads a DOCX archive from disk and returns its raw parts.

func LoadPackageFromStream

func LoadPackageFromStream(r io.Reader) (*Package, error)

LoadPackageFromStream reads a DOCX archive from an io.Reader by buffering its content.

type ParsedPackage

type ParsedPackage struct {
	Package *Package

	DocumentTree *Element
	StylesTree   *Element
	HeaderTrees  map[string]*Element
	FooterTrees  map[string]*Element

	RootRelationships     *xmlstructs.Relationships
	DocumentRelationships *xmlstructs.Relationships

	CorePropertiesTree *Element
	AppPropertiesTree  *Element
	CustomProperties   []byte

	ThemeParts  map[string][]byte
	Numbering   []byte
	FontTable   []byte
	Settings    []byte
	WebSettings []byte
}

ParsedPackage holds strongly typed OOXML structures extracted from a Package.

func ParsePackage

func ParsePackage(pkg *Package) (*ParsedPackage, error)

ParsePackage converts the raw byte-oriented Package into typed OOXML structures.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL