Skip to content

Automatically detects and blurs sensitive info (IDs, addresses, etc.) in images before sharing

License

Notifications You must be signed in to change notification settings

youssof20/image-pii-redactor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Image PII Redactor

A privacy-first, local-first tool for detecting and redacting Personally Identifiable Information (PII) from images. Built for security professionals, compliance teams, and anyone who needs to share images while protecting sensitive data.

untitledvideo-madewithclipchamp_47a0_v1

πŸš€ Quick Start

Local Installation (Recommended)

git clone https://github.com/youssof20/image-pii-redactor.git
cd image-pii-redactor
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Note: If you see a PyTesseract warning, it's normal - the app will use EasyOCR as the primary engine. Tesseract is only needed as a fallback.

Run the Web App

Option 1: Use the launcher (Recommended):

python launch_app.py

Option 2: Use the batch file (Windows):

launch_app.bat

Option 3: Manual launch:

python -m streamlit run app/app.py --server.port=8501

Run from CLI

python app/cli.py --input path/to/image.jpg --output-dir ./output

Docker (Optional)

docker-compose up

✨ Features

  • Local-First: All processing runs on your device - no data leaves your machine
  • Fast Processing: End-to-end processing in ≀5 seconds for A4/photo documents
  • Multiple Redaction Modes: Block, blur, or token replacement
  • Interactive Preview: Review and modify detections before redaction
  • Comprehensive PII Detection: Emails, phones, cards, IDs, addresses, dates, names
  • Export Options: Redacted images + detailed redaction logs

πŸ”’ Privacy & Ethics

This tool is designed with privacy as the core principle:

  • No Cloud Calls: By default, all processing happens locally on your machine
  • No Data Collection: We don't collect, store, or transmit your images or data
  • Open Source: Full transparency into how your data is processed
  • Synthetic Samples: All examples use synthetic data - never real personal information

Use Responsibly:

  • Only process images you own or have explicit permission to redact
  • Respect privacy laws and regulations in your jurisdiction
  • Consider the sensitivity of data even after redaction

πŸ“‹ Requirements

  • Python 3.8+
  • 4GB RAM minimum (8GB recommended)
  • Camera/webcam for mobile document capture

πŸ› οΈ Usage

Web Interface

  1. Upload an image (JPEG/PNG)
  2. Review detected PII entities
  3. Toggle entities on/off for redaction
  4. Choose redaction mode per entity
  5. Apply redaction and download results

Command Line

# Single image
python app/cli.py --input document.jpg --output-dir ./redacted

# Batch processing
python app/cli.py --input ./documents/ --output-dir ./redacted --batch

# Custom redaction modes
python app/cli.py --input document.jpg --mode block --confidence 0.8

Programmatic Usage

from app.redact import PIIRedactor

redactor = PIIRedactor()
result = redactor.process_image("document.jpg")
redactor.save_redacted_image(result, "output.png")
redactor.save_redaction_log(result, "redact-log.json")

πŸ“ Output Files

  • Redacted Image: redacted_<filename>.png
  • Redaction Log: redact-log.json with detection details and coordinates

πŸ”§ Configuration

Redaction Modes

  • Block: Solid color fill
  • Blur: Gaussian blur preserving layout
  • Token: Text replacement with masked characters

Detection Methods

  • Regex Rules: Fast, high-confidence detection for structured data
  • SpaCy NER: AI-powered detection for names, locations, and complex patterns

πŸ§ͺ Testing

pytest tests/

πŸ“Š Performance

  • Single A4/Photo: ≀5 seconds on modern laptop
  • Batch Processing: Linear scaling with image count
  • Memory Usage: ~500MB peak for typical documents

🚧 Limitations

  • Image Quality: Requires clear, readable text for optimal OCR
  • Language Support: Primary support for English text
  • Complex Layouts: May struggle with heavily formatted documents
  • Mobile Optimization: TFLite conversion available but requires additional setup

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.

⚠️ Disclaimer

This tool is provided "as is" without warranties. Users are responsible for ensuring compliance with applicable privacy laws and regulations. The authors are not liable for any misuse or privacy violations.

πŸ”— Links

About

Automatically detects and blurs sensitive info (IDs, addresses, etc.) in images before sharing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors