epub

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 2, 2025 License: MIT Imports: 8 Imported by: 3

README

go-epub

A Go library for reading and parsing EPUB files.

Go Reference Go Report Card

Features

  • Read EPUB file metadata (title, author, description, etc.)
  • Extract chapters and their content
  • Access any file within an EPUB archive
  • io.Reader interface support for reading content
  • Simple and intuitive API
  • Get cover image from EPUB
  • Support for both EPUB 2.0 and 3.0 files
  • Context support for cancellation and timeouts
  • Option pattern for flexible configuration

Installation

go get github.com/mszlu521/go-epub

Documentation

You can view the documentation online at pkg.go.dev or by using the go doc command:

go doc github.com/mszlu521/go-epub

For documentation on specific functions:

go doc epub.Open
go doc epub.Epub.GetChapters

Usage

Basic Example
package main

import (
	"fmt"
	"log"
	"io"
	"os"

	"github.com/mszlu521/go-epub/epub"
)

func main() {
	// Open an EPUB file
	e, err := epub.Open("book.epub")
	if err != nil {
		log.Fatal(err)
	}
	defer e.Close()

	// Get book metadata
	fmt.Println("Title:", e.GetTitle())
	fmt.Println("Author:", e.GetAuthor())
	fmt.Println("Description:", e.GetDescription())

	// Get chapters
	chapters, err := e.GetChapters()
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Found %d chapters\n", len(chapters))

	// Read content of the first chapter using io.Reader
	reader, err := e.GetChapterReader(0)
	if err != nil {
		log.Fatal(err)
	}

	// Copy the content to stdout
	_, err = io.Copy(os.Stdout, reader)
	if err != nil {
		log.Fatal(err)
	}
}
Reading Files from EPUB
// Get a reader for any file in the EPUB
reader, err := e.GetFileReader("META-INF/container.xml")
if err != nil {
    log.Fatal(err)
}
defer reader.Close()

content, err := io.ReadAll(reader)
if err != nil {
    log.Fatal(err)
}

fmt.Println(string(content))
Working with Chapters
// Get all chapters
chapters, err := e.GetChapters()
if err != nil {
    log.Fatal(err)
}

// Print information about each chapter
for _, chapter := range chapters {
    fmt.Printf("Chapter %d: %s\n", chapter.Order, chapter.Title)
    
    // Get chapter content as string
    content, err := e.GetChapterContent(chapter.Order - 1)
    if err != nil {
        log.Printf("Error reading chapter %d: %v", chapter.Order, err)
        continue
    }
    
    fmt.Printf("Content length: %d bytes\n", len(content))
}
Using Context and Options
import (
    "context"
    "time"
    "github.com/mszlu521/go-epub/epub"
)

// Create a context with timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

// Get chapters with context and options
chapters, err := e.GetChapters(
    epub.WithContext(ctx),
    epub.WithChapterFilter(func(chapter epub.Chapter) bool {
        // Only include chapters with content longer than 100 characters
        return len(chapter.Content) > 100
    }),
    epub.WithMaxContentLength(1024*1024), // 1MB limit
)

if err != nil {
    log.Fatal(err)
}
Getting the Cover Image
// Try to get the cover image
cover, err := e.GetCover()
if err != nil {
    log.Fatal(err)
}

if cover != nil {
    defer cover.Close()
    // Process cover image (e.g. copy to file)
    out, err := os.Create("cover.jpg")
    if err != nil {
        log.Fatal(err)
    }
    defer out.Close()
    
    _, err = io.Copy(out, cover)
    if err != nil {
        log.Fatal(err)
    }
    
    fmt.Println("Cover image saved to cover.jpg")
} else {
    fmt.Println("No cover image found")
}

API Reference

epub.Epub

The main struct representing an EPUB file.

Methods
  • Open(path string) (*Epub, error) - Open and parse an EPUB file
  • New(r *zip.Reader) (*Epub, error) - Create EPUB from a zip.Reader
  • GetTitle() string - Get the book title
  • GetAuthor() string - Get the book author
  • GetDescription() string - Get the book description
  • GetMetadata() Metadata - Get complete book metadata
  • GetItems() []Item - Get all items in the manifest
  • GetChapters(...Option) ([]Chapter, error) - Get all chapters with options
  • GetChapterContent(chapterIndex int, ...Option) (string, error) - Get content of a specific chapter as string
  • GetChapterReader(chapterIndex int, ...Option) (io.Reader, error) - Get content of a specific chapter as io.Reader
  • GetFileReader(path string) (io.ReadCloser, error) - Get a reader for any file in the EPUB
  • GetCover() (io.ReadCloser, error) - Get the cover image of the EPUB
  • Close() error - Close the EPUB file
epub.Chapter

Represents a book chapter.

Fields:

  • Title string - Chapter title
  • Content string - Chapter content
  • Order int - Chapter order
epub.Document

Represents a parsed document from the EPUB file.

Fields:

  • Title string - Document title
  • Content string - Document content
  • MediaType string - Document media type
  • ID string - Document ID
epub.Metadata

Represents the metadata of an EPUB.

Fields:

  • Title string - The title of the book
  • Creator string - The creator/author of the book
  • Subject string - The subject of the book
  • Description string - A description of the book
  • Publisher string - The publisher of the book
  • Contributor string - Additional contributors
  • Date string - Publication date
  • Type string - The type of the book
  • Format string - The format of the book
  • Identifier string - Unique identifier for the book
  • Language string - Language of the book
  • Rights string - Copyright information
epub.Item

Represents an item in the manifest.

Fields:

  • ID string - Unique identifier for the item
  • Href string - Path to the item within the EPUB
  • MediaType string - MIME type of the item
Options
  • WithContext(ctx context.Context) Option - Set context for cancellation and timeout
  • WithChapterFilter(filter func(chapter Chapter) bool) Option - Filter chapters with a custom function
  • WithMaxContentLength(maxLen int64) Option - Set maximum content length to process

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

MIT

Documentation

Overview

Package epub provides functionality for reading and parsing EPUB files.

It allows you to open EPUB files and extract their metadata, chapters, and other content. The package provides both high-level functions for common operations and low-level access to the internal structure of EPUB files.

Basic usage:

package main

import (
	"fmt"
	"log"
	"io"
	"os"

	"github.com/mszlu521/go-epub/epub"
)

func main() {
	// Open an EPUB file
	e, err := epub.Open("book.epub")
	if err != nil {
		log.Fatal(err)
	}
	defer e.Close()

	// Get book metadata
	fmt.Println("Title:", e.GetTitle())
	fmt.Println("Author:", e.GetAuthor())
	fmt.Println("Description:", e.GetDescription())

	// Get chapters
	chapters, err := e.GetChapters()
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Found %d chapters\n", len(chapters))

	// Read content of the first chapter using io.Reader
	reader, err := e.GetChapterReader(0)
	if err != nil {
		log.Fatal(err)
	}

	// Copy the content to stdout
	_, err = io.Copy(os.Stdout, reader)
	if err != nil {
		log.Fatal(err)
	}
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chapter

type Chapter struct {
	Title   string
	Content string
	Order   int
}

Chapter represents a book chapter

A Chapter contains the title, content, and order of a chapter in the EPUB file. Chapters are extracted based on the spine order defined in the EPUB package document.

type Container

type Container struct {
	Rootfiles []Rootfile `xml:"rootfiles>rootfile"`
}

Container represents the container.xml file structure

type Epub

type Epub struct {
	File     *zip.Reader
	RootFile string
	Metadata Metadata
	Manifest []Item
	Spine    []ItemRef
	TOC      *NCX
	// contains filtered or unexported fields
}

Epub represents an EPUB file

The Epub struct contains the parsed contents of an EPUB file, including its metadata, manifest, spine, and table of contents. It also maintains a reference to the underlying zip.Reader for accessing the raw file contents.

func New

func New(r *zip.Reader) (*Epub, error)

New creates and parses an EPUB from a zip.Reader

The New function takes a zip.Reader and returns a pointer to an Epub struct that represents the parsed contents of the EPUB. This is useful when you already have a zip.Reader and want to parse it as an EPUB file.

Example:

// When you have an io.Reader containing EPUB data
reader, err := os.Open("book.epub")
if err != nil {
	log.Fatal(err)
}
defer reader.Close()

// Get file info to create a sized reader
stat, err := reader.Stat()
if err != nil {
	log.Fatal(err)
}

// Create a zip reader
zipReader, err := zip.NewReader(reader, stat.Size())
if err != nil {
	log.Fatal(err)
}

// Parse as EPUB
e, err := epub.New(zipReader)
if err != nil {
	log.Fatal(err)
}
defer e.Close()

func NewReader added in v1.0.1

func NewReader(r io.Reader) (*Epub, error)

NewReader creates and parses an EPUB from an io.Reader

The NewReader function takes an io.Reader and returns a pointer to an Epub struct that represents the parsed contents of the EPUB. This is useful when you have an io.Reader and want to parse it as an EPUB file. Note that this function reads the entire content into memory to provide random access.

Example:

// When you have an io.Reader containing EPUB data
reader, err := os.Open("book.epub")
if err != nil {
	log.Fatal(err)
}
defer reader.Close()

// Parse as EPUB
e, err := epub.NewReader(reader)
if err != nil {
	log.Fatal(err)
}
defer e.Close()

func Open

func Open(path string) (*Epub, error)

Open opens and parses an EPUB file from a file path

The Open function takes a path to an EPUB file and returns a pointer to an Epub struct that represents the parsed contents of the file. It handles all the necessary parsing of the EPUB structure, including the container file, package document, and table of contents.

It is the caller's responsibility to call Close on the returned Epub when finished with it to free up resources.

Example:

e, err := epub.Open("book.epub")
if err != nil {
	log.Fatal(err)
}
defer e.Close()

title := e.GetTitle()

func (*Epub) Close

func (e *Epub) Close() error

Close closes the EPUB file

This method closes the underlying EPUB file and releases any associated resources. It should be called when finished working with the EPUB to prevent resource leaks.

Example:

e, err := epub.Open("book.epub")
if err != nil {
	log.Fatal(err)
}
defer e.Close() // Ensures the file is closed when done

func (*Epub) GetAuthor

func (e *Epub) GetAuthor() string

GetAuthor returns the book author

This method returns the creator/author of the EPUB book as defined in its metadata. If no author is defined in the EPUB metadata, an empty string is returned.

func (*Epub) GetChapterContent

func (e *Epub) GetChapterContent(chapterIndex int, opts ...Option) (string, error)

GetChapterContent returns the content of a specific chapter

This method returns the content of a chapter at the specified index as a string. The index is zero-based, so the first chapter is at index 0.

If the chapter index is out of range or an error occurs while retrieving the chapter content, an error is returned.

Example:

content, err := e.GetChapterContent(0)
if err != nil {
	log.Fatal(err)
}
fmt.Println(content)

func (*Epub) GetChapterReader

func (e *Epub) GetChapterReader(chapterIndex int, opts ...Option) (io.Reader, error)

GetChapterReader returns an io.Reader for a specific chapter

This method returns an io.Reader for the content of a chapter at the specified index. The index is zero-based, so the first chapter is at index 0.

This is useful when you want to stream the chapter content rather than load it entirely into memory. The returned reader can be used with standard Go io operations.

If the chapter index is out of range or an error occurs while retrieving the chapter content, an error is returned.

Example:

reader, err := e.GetChapterReader(0)
if err != nil {
	log.Fatal(err)
}

// Copy the content to stdout
_, err = io.Copy(os.Stdout, reader)
if err != nil {
	log.Fatal(err)
}

func (*Epub) GetChapters

func (e *Epub) GetChapters(opts ...Option) ([]Chapter, error)

GetChapters returns all chapter content

This method extracts all chapters from the EPUB file based on the spine order defined in the package document. It only processes items with HTML media types and attempts to extract chapter titles from the table of contents.

The method returns a slice of Chapter structs containing the title, content, and order of each chapter. If there are no chapters or an error occurs during processing, an empty slice and an error may be returned.

Example:

chapters, err := e.GetChapters()
if err != nil {
	log.Fatal(err)
}

for _, chapter := range chapters {
	fmt.Printf("Chapter %d: %s\n", chapter.Order, chapter.Title)
}

func (*Epub) GetCover

func (e *Epub) GetCover() (io.ReadCloser, error)

GetCover returns a reader for the cover image of the EPUB, if one exists

This method attempts to locate and return a reader for the cover image of the EPUB. Not all EPUBs have a cover image, and the location of the cover can vary between EPUB versions. If a cover image is found, an io.ReadCloser is returned which the caller must close. If no cover is found, nil is returned with no error.

Example:

cover, err := e.GetCover()
if err != nil {
	log.Fatal(err)
}

if cover != nil {
	defer cover.Close()
	// Process cover image
} else {
	fmt.Println("No cover image found")
}

func (*Epub) GetDescription

func (e *Epub) GetDescription() string

GetDescription returns the book description

This method returns the description of the EPUB book as defined in its metadata. If no description is defined in the EPUB metadata, an empty string is returned.

func (*Epub) GetFileReader

func (e *Epub) GetFileReader(path string) (io.ReadCloser, error)

GetFileReader returns an io.Reader for a file in the EPUB by path

This method returns an io.ReadCloser for any file within the EPUB archive, identified by its path. The path should be relative to the root of the EPUB.

This is useful for accessing specific files within the EPUB, such as CSS files, images, or other resources. The caller is responsible for closing the returned ReadCloser when finished with it.

If the specified file is not found in the EPUB, an error is returned.

Example:

reader, err := e.GetFileReader("META-INF/container.xml")
if err != nil {
	log.Fatal(err)
}
defer reader.Close()

content, err := io.ReadAll(reader)
if err != nil {
	log.Fatal(err)
}
fmt.Println(string(content))

func (*Epub) GetItems

func (e *Epub) GetItems() []Item

GetItems returns all items in the EPUB manifest

This method returns the complete list of items declared in the EPUB manifest. Each item contains its ID, href (path), and media type. This can be useful for examining all resources included in the EPUB file.

Example:

items := e.GetItems()
for _, item := range items {
	fmt.Printf("ID: %s, Href: %s, MediaType: %s\n", item.ID, item.Href, item.MediaType)
}

func (*Epub) GetMetadata

func (e *Epub) GetMetadata() Metadata

GetMetadata returns the complete metadata of the book

This method returns the complete metadata struct of the EPUB book, which includes all available metadata fields like title, author, subject, description, publisher, etc.

Example:

metadata := e.GetMetadata()
fmt.Println("Title:", metadata.Title)
fmt.Println("Author:", metadata.Creator)
fmt.Println("Publisher:", metadata.Publisher)

func (*Epub) GetTitle

func (e *Epub) GetTitle() string

GetTitle returns the book title

This method returns the title of the EPUB book as defined in its metadata. If no title is defined in the EPUB metadata, an empty string is returned.

type Item

type Item struct {
	ID        string `xml:"id,attr"`
	Href      string `xml:"href,attr"`
	MediaType string `xml:"media-type,attr"`
}

Item represents an item in the manifest

type ItemRef

type ItemRef struct {
	IDRef  string `xml:"idref,attr"`
	Linear string `xml:"linear,attr"`
}

ItemRef represents an item reference in the spine

type Metadata

type Metadata struct {
	Title       string `xml:"title"`
	Creator     string `xml:"creator"`
	Subject     string `xml:"subject"`
	Description string `xml:"description"`
	Publisher   string `xml:"publisher"`
	Contributor string `xml:"contributor"`
	Date        string `xml:"date"`
	Type        string `xml:"type"`
	Format      string `xml:"format"`
	Identifier  string `xml:"identifier"`
	Language    string `xml:"language"`
	Rights      string `xml:"rights"`
}

Metadata represents the metadata of an EPUB

type NCX

type NCX struct {
	Title  string     `xml:"docTitle>text"`
	NavMap []NavPoint `xml:"navMap>navPoint"`
}

NCX represents the NCX file structure (table of contents)

type NavPoint struct {
	ID        string     `xml:"id,attr"`
	PlayOrder string     `xml:"playOrder,attr"`
	Label     string     `xml:"navLabel>text"`
	Content   string     `xml:"content"`
	Src       string     `xml:"content,attr"`
	NavPoints []NavPoint `xml:"navPoint"`
}

NavPoint represents a navigation point (chapter)

type Option

type Option func(*epubOptions)

Option defines a functional option for configuring EPUB parsing

func WithChapterFilter

func WithChapterFilter(filter func(chapter Chapter) bool) Option

WithChapterFilter sets a filter function for chapters

func WithContext

func WithContext(ctx context.Context) Option

WithContext sets the context for the EPUB parsing operation

func WithCover

func WithCover() Option

WithCover includes the cover image in the parsed content

func WithMaxContentLength

func WithMaxContentLength(maxLen int64) Option

WithMaxContentLength sets the maximum content length to process

func WithMetadata

func WithMetadata() Option

WithMetadata includes full metadata in the parsed content

type Package

type Package struct {
	Metadata Metadata  `xml:"metadata"`
	Manifest []Item    `xml:"manifest>item"`
	Spine    []ItemRef `xml:"spine>itemref"`
}

Package represents the package document structure

type Rootfile

type Rootfile struct {
	FullPath  string `xml:"full-path,attr"`
	MediaType string `xml:"media-type,attr"`
}

Rootfile represents root file information

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL