A privacy-first, local-first tool for detecting and redacting Personally Identifiable Information (PII) from images. Built for security professionals, compliance teams, and anyone who needs to share images while protecting sensitive data.
git clone https://github.com/youssof20/image-pii-redactor.git
cd image-pii-redactor
pip install -r requirements.txt
python -m spacy download en_core_web_smNote: If you see a PyTesseract warning, it's normal - the app will use EasyOCR as the primary engine. Tesseract is only needed as a fallback.
Option 1: Use the launcher (Recommended):
python launch_app.pyOption 2: Use the batch file (Windows):
launch_app.batOption 3: Manual launch:
python -m streamlit run app/app.py --server.port=8501python app/cli.py --input path/to/image.jpg --output-dir ./outputdocker-compose up- Local-First: All processing runs on your device - no data leaves your machine
- Fast Processing: End-to-end processing in β€5 seconds for A4/photo documents
- Multiple Redaction Modes: Block, blur, or token replacement
- Interactive Preview: Review and modify detections before redaction
- Comprehensive PII Detection: Emails, phones, cards, IDs, addresses, dates, names
- Export Options: Redacted images + detailed redaction logs
This tool is designed with privacy as the core principle:
- No Cloud Calls: By default, all processing happens locally on your machine
- No Data Collection: We don't collect, store, or transmit your images or data
- Open Source: Full transparency into how your data is processed
- Synthetic Samples: All examples use synthetic data - never real personal information
Use Responsibly:
- Only process images you own or have explicit permission to redact
- Respect privacy laws and regulations in your jurisdiction
- Consider the sensitivity of data even after redaction
- Python 3.8+
- 4GB RAM minimum (8GB recommended)
- Camera/webcam for mobile document capture
- Upload an image (JPEG/PNG)
- Review detected PII entities
- Toggle entities on/off for redaction
- Choose redaction mode per entity
- Apply redaction and download results
# Single image
python app/cli.py --input document.jpg --output-dir ./redacted
# Batch processing
python app/cli.py --input ./documents/ --output-dir ./redacted --batch
# Custom redaction modes
python app/cli.py --input document.jpg --mode block --confidence 0.8from app.redact import PIIRedactor
redactor = PIIRedactor()
result = redactor.process_image("document.jpg")
redactor.save_redacted_image(result, "output.png")
redactor.save_redaction_log(result, "redact-log.json")- Redacted Image:
redacted_<filename>.png - Redaction Log:
redact-log.jsonwith detection details and coordinates
- Block: Solid color fill
- Blur: Gaussian blur preserving layout
- Token: Text replacement with masked characters
- Regex Rules: Fast, high-confidence detection for structured data
- SpaCy NER: AI-powered detection for names, locations, and complex patterns
pytest tests/- Single A4/Photo: β€5 seconds on modern laptop
- Batch Processing: Linear scaling with image count
- Memory Usage: ~500MB peak for typical documents
- Image Quality: Requires clear, readable text for optimal OCR
- Language Support: Primary support for English text
- Complex Layouts: May struggle with heavily formatted documents
- Mobile Optimization: TFLite conversion available but requires additional setup
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.
This tool is provided "as is" without warranties. Users are responsible for ensuring compliance with applicable privacy laws and regulations. The authors are not liable for any misuse or privacy violations.
