LLM Attachment Validator 🚀

A document intelligence pipeline that classifies email attachments from .eml files as relevant or irrelevant, using only the email's HTML body context.

This project demonstrates a production-style LLM pipeline for contextual document understanding and evaluation.

The system leverages the Anthropic Claude API for contextual reasoning and includes an evaluation module for benchmarking performance against ground truth data.

✨ Core Features

Intelligent Classification
Analyzes email HTML bodies to determine if each attachment is contextually relevant.
Pure Contextual Reasoning
Classification relies exclusively on the email body, simulating real-world scenarios where attachment metadata is unavailable.
Structured Output
Generates clear, machine-readable JSON results for every email processed.
Built-in Evaluation
Compare predictions against ground truth data using standard classification metrics:
- Accuracy
- Precision
- Recall
- F1 Score
Extensible Design
Modular design allows experimentation with prompts, models, and classification logic.

🧠 How It Works

The validation process follows strict rules to ensure the focus remains on contextual understanding:

Input
An .eml file is provided.
Extraction

The system extracts:
- The full HTML body
- A list of attachment filenames
The system intentionally ignores:
- Attachment content
- MIME types
- Email headers
Reasoning (Claude API)

The HTML body and attachment list are sent to the Claude model with a carefully engineered prompt.

The model must infer relevance based only on the text and structure of the email.
Decision

Each attachment is classified as:
- relevant
  - Materially referenced
  - Important to the email topic
- irrelevant
  - Logos
  - Signature images
  - Decorative elements
Output

A JSON file is created listing attachments under:
- relevant
- irrelevant

📁 Project Structure

llm-attachment-validator/
│
├── examples/                 # Input .eml files
│   ├── example_00001.eml
│   ├── example_00002.eml
│   └── ...
│
├── ground_truth/             # Ground-truth JSON files
│   ├── attachments_00001.json
│   ├── attachments_00002.json
│   └── ...
│
├── output/                   # Generated classification results
│
├── classify_attachments.py   # Runs attachment classification
├── evaluate.py               # Evaluates predictions vs ground truth
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

⚙️ Getting Started

Prerequisites

Python 3.9+
Anthropic API Key

Installation

Clone the repository:

git clone https://github.com/NikolaPantel/llm-attachment-validator.git
cd llm-attachment-validator

Create a virtual environment:

python -m venv venv

Activate environment:

Mac/Linux:

source venv/bin/activate

Windows:

venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

requirements.txt:

anthropic
beautifulsoup4
tqdm
scikit-learn

Set API Key

macOS / Linux:

export ANTHROPIC_API_KEY="your-api-key-here"

Windows:

set ANTHROPIC_API_KEY=your-api-key-here

🚦 Usage

Classify Attachments

Place .eml files in:

examples/

Run:

python classify_attachments.py

Results are saved in:

output/

Example Output

output/attachments_00001.json

{
  "relevant": [
    "quarterly_report_q1_2026.pdf"
  ],
  "irrelevant": [
    "company_logo.png",
    "email_signature_banner.jpg"
  ]
}

Evaluate Performance

Place ground truth files in:

ground_truth/

Run:

python evaluate.py

Metrics include:

Accuracy
Precision
Recall
F1 Score

📊 Project Purpose

This project was built to demonstrate:

LLM pipeline design
Prompt engineering
Email parsing
Structured output generation
Evaluation frameworks
Production-style project organization

📜 License

📬 Contact

Nikola Pantel
GitHub: https://github.com/NikolaPantel

Project Link:

https://github.com/NikolaPantel/llm-attachment-validator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Attachment Validator 🚀

✨ Core Features

🧠 How It Works

📁 Project Structure

⚙️ Getting Started

Prerequisites

Installation

Set API Key

🚦 Usage

Classify Attachments

Example Output

Evaluate Performance

📊 Project Purpose

📜 License

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
ground_truth		ground_truth
output		output
venv		venv
.env.example		.env.example
README.md		README.md
classify_attachments.py		classify_attachments.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM Attachment Validator 🚀

✨ Core Features

🧠 How It Works

📁 Project Structure

⚙️ Getting Started

Prerequisites

Installation

Set API Key

🚦 Usage

Classify Attachments

Example Output

Evaluate Performance

📊 Project Purpose

📜 License

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages