Paperless-ngx
Overview
Paperless-ngx is a powerful open-source document management system designed to transform your physical paper documents into a searchable, organized digital archive. As the community-maintained successor to the original Paperless and Paperless-ng projects, Paperless-ngx represents years of development focused on creating the ultimate self-hosted solution for going paperless.
In an age where important documents arrive daily through mail, email, and digital downloads, managing paper becomes increasingly challenging. Paperless-ngx solves this problem by providing a complete workflow for scanning, processing, organizing, and retrieving documents. Its intelligent processing pipeline uses optical character recognition (OCR) to make every document fully searchable, while machine learning algorithms automatically categorize and tag incoming documents.
Whether you’re an individual looking to digitize personal records or an organization managing thousands of documents, Paperless-ngx scales to meet your needs while keeping your sensitive information completely under your control on your own hardware.
Key Features
Intelligent Document Processing
Paperless-ngx processes documents through a sophisticated pipeline:
- OCR Processing: Extracts text from scanned documents and images using Tesseract
- Language Detection: Automatically detects document language for proper OCR
- Multi-format Support: Processes PDF, images, Office documents, and more
- PDF/A Conversion: Converts documents to archival-quality PDF format
- Duplicate Detection: Identifies and prevents duplicate document imports
Smart Organization
Organize documents with flexible, powerful tools:
- Correspondents: Track document senders and recipients
- Document Types: Categorize by type (invoice, receipt, contract, etc.)
- Tags: Apply multiple tags for cross-cutting organization
- Storage Paths: Control physical file organization on disk
- Custom Fields: Add your own metadata fields to documents
- Date Detection: Automatically extract dates from document content
Machine Learning Classification
Let Paperless-ngx learn your organization preferences:
- Automatic correspondent detection from document content
- Document type classification based on content patterns
- Tag suggestions using machine learning models
- Continuous learning from your corrections
- Customizable confidence thresholds
Full-Text Search
Find any document instantly:
- Search across all document text content
- Filter by correspondent, type, tags, or date range
- Saved search views for common queries
- Advanced query syntax for complex searches
- Search result highlighting and previews
Modern Web Interface
Access your documents from anywhere:
- Responsive design for desktop and mobile
- Document preview with zoom and rotation
- Bulk editing and organization tools
- Dashboard with recent documents and saved views
- Dark mode and customizable themes
Multiple Input Methods
Get documents into Paperless-ngx easily:
- Consume Folder: Drop files into a watched directory
- Email: Forward emails with attachments for automatic import
- Web Upload: Drag and drop through the web interface
- Mobile Apps: Third-party scanner apps integration
- API: Programmatic document upload
System Requirements
Minimum Requirements
- Operating System: Linux (Docker recommended)
- RAM: 2GB minimum (4GB+ recommended)
- CPU: Any modern processor (multi-core improves OCR speed)
- Storage: Based on document collection size
- Docker: Docker and Docker Compose for easy deployment
Recommended Specifications
- RAM: 4-8GB for large document collections
- CPU: 4+ cores for faster OCR processing
- Storage: SSD for database and index, HDD acceptable for documents
- Network: Gigabit for remote access and scanner uploads
Scanner Recommendations
- Document scanners with automatic document feeders (ADF)
- Network scanners with scan-to-folder capability
- Mobile phone cameras for quick captures
- Multifunction printers with scanning capability
Installation Guide
Docker Compose Installation (Recommended)
The easiest way to deploy Paperless-ngx:
- Ensure Docker and Docker Compose are installed
- Download the docker-compose.yml from the official repository
- Configure environment variables in .env file
- Run docker compose up -d to start all containers
- Access the web interface at http://localhost:8000
- Create an admin account using the createsuperuser command
Configuration Options
Key environment variables to configure:
- PAPERLESS_SECRET_KEY: Unique secret for security
- PAPERLESS_OCR_LANGUAGE: Default OCR language(s)
- PAPERLESS_TIME_ZONE: Your local timezone
- PAPERLESS_ADMIN_USER: Initial admin username
- PAPERLESS_ADMIN_PASSWORD: Initial admin password
- PAPERLESS_CONSUMPTION_DIR: Folder to watch for new documents
Alternative Installation Methods
- Bare Metal: Manual installation on Linux systems
- Kubernetes: Helm charts available for K8s deployment
- Synology NAS: Community packages for Synology devices
- Proxmox: Helper scripts for LXC container deployment
Getting Started
Initial Setup
After installation, configure your instance:
- Log in with your admin credentials
- Navigate to Settings to configure preferences
- Set up correspondents for common senders
- Create document types matching your needs
- Design a tag system for organization
- Configure storage paths if desired
Importing Existing Documents
Migrate your document collection:
- Copy documents to the consume folder
- Paperless-ngx will process each file automatically
- Monitor the processing queue in the web interface
- Review and correct automatic classifications
- The system learns from your corrections
Setting Up a Scanner
Connect your scanner to Paperless-ngx:
- Configure scanner to save to a network folder
- Mount that folder as the consume directory
- Scanned documents automatically import
- Configure scan resolution (300 DPI recommended for OCR)
- Set up scan profiles for different document types
Email Import Setup
Import documents from email:
- Configure IMAP settings in docker-compose.yml
- Set up a dedicated email address for documents
- Forward receipts and documents to this address
- Attachments are automatically extracted and processed
Advanced Features
Workflows and Automation
Create automated processing rules:
- Assign correspondents based on content patterns
- Apply tags based on document content or metadata
- Set document types from matching rules
- Define multiple rules with priority ordering
- Test rules against existing documents
Custom Fields
Extend documents with your own metadata:
- Create custom fields for specific data (invoice number, amount, etc.)
- Multiple field types: text, date, number, URL, boolean
- Search and filter by custom fields
- Include in document exports
Share Links
Share documents securely:
- Generate shareable links for specific documents
- Set expiration dates for temporary access
- No account required for recipients
- Track link usage and access
API Access
Integrate with other systems:
- RESTful API for all operations
- Upload documents programmatically
- Query and retrieve documents
- Automate workflows with external tools
- API documentation included
Use Cases
Personal Document Management
Organize your household documents: scan receipts, bills, tax documents, medical records, and important correspondence. Never lose a document again and retrieve any paper in seconds.
Small Business
Manage invoices, contracts, and business correspondence. Track document history and maintain organized records for accounting and legal purposes.
Home Office
Go truly paperless by scanning and shredding incoming mail. Quickly find warranty information, manuals, and important documents when needed.
Legal and Compliance
Maintain organized, searchable records for compliance requirements. Track document history and access for audit purposes.
Comparison with Alternatives
Paperless-ngx vs Traditional File Storage
- Search: Full-text search vs filename-only search
- Organization: Multi-dimensional tagging vs folder hierarchy
- Processing: Automatic OCR vs manual text extraction
- Classification: Machine learning vs manual sorting
Paperless-ngx vs Commercial DMS
- Cost: Free and open source vs expensive licensing
- Privacy: Self-hosted vs cloud storage
- Customization: Fully customizable vs vendor limitations
- Features: Comparable to many commercial offerings
Paperless-ngx vs Evernote/OneNote
- Focus: Paperless-ngx is document-focused; others are general notes
- OCR: Better document OCR and archival quality
- Organization: More document-specific organization tools
- Self-Hosted: Complete data ownership
Backup and Recovery
Backup Strategy
Protect your document archive:
- Database: Regular PostgreSQL dumps
- Media Files: Backup the media directory containing documents
- Configuration: Save docker-compose.yml and .env files
- Export: Use built-in export functionality periodically
Disaster Recovery
Restore from backup:
- Deploy fresh Paperless-ngx instance
- Restore database from backup
- Restore media files to correct location
- Verify document access and search functionality
Community and Support
Paperless-ngx has an active community:
- GitHub: Issue tracking, feature requests, and development
- Discord: Real-time community support and discussion
- Documentation: Comprehensive official documentation
- Reddit: r/selfhosted discussions and tips
Conclusion
Paperless-ngx represents the ultimate solution for anyone serious about digitizing and organizing their documents. Its combination of intelligent processing, flexible organization, and powerful search capabilities makes finding any document effortless. As a self-hosted solution, you maintain complete control over your sensitive documents while benefiting from enterprise-grade features at no cost. Start your paperless journey today and never lose a document again.
Download Options
Safe & Secure
Verified and scanned for viruses
Regular Updates
Always get the latest version
24/7 Support
Help available when you need it
System Requirements
- Docker, 2GB+ RAM, Linux