Bates-Labeler: Open-Source PDF Numbering for Legal Discovery
Open-source Python tool for adding Bates numbers to PDF documents, designed for legal professionals. Features both an intuitive Streamlit web interface and powerful CLI with support for batch processing, custom formatting, logos, QR codes, and watermarks.
Try It Now
Bates-Labeler: Democratizing Legal Document Management
Legal discovery and document management require precise Bates numbering – a standardized system for uniquely identifying pages in legal documents. Traditional solutions are often expensive proprietary software packages that cost thousands of dollars annually, putting professional-grade tools out of reach for solo practitioners, small firms, and legal tech startups.
I developed Bates-Labeler as a completely free, open-source alternative that delivers enterprise-grade functionality without the enterprise price tag. Licensed under MIT, this Python-based tool provides both an intuitive web interface for non-technical users and a powerful command-line interface for automation and integration, making professional legal document numbering accessible to everyone.
With over 1,000 lines of carefully crafted Python code, comprehensive documentation, and a growing community of contributors, Bates-Labeler represents the future of open-source legal technology.
Key Features
Dual Interface Design
- Streamlit Web UI: Professional drag-and-drop interface requiring zero technical knowledge
- Command-Line Interface: Powerful CLI for automation, scripting, and integration with existing workflows
- Configuration Presets: Pre-built templates for Legal Discovery, Confidential Documents, and Exhibits
- Real-Time Preview: See exactly how your Bates numbers will appear before processing
Advanced Document Processing
- Batch Processing: Handle multiple PDFs simultaneously with continuous numbering across documents
- Password Protection Support: Process encrypted PDFs with secure password handling
- PDF Combining: Merge multiple documents into a single file with uninterrupted Bates sequences
- Index Generation: Automatically create professional tables of contents for combined productions
- Separator Pages: Insert branded divider pages between documents with Bates range summaries
Professional Customization
- Flexible Formatting: Custom prefixes, suffixes, padding, and starting numbers (e.g., "PLAINTIFF-PROD-000001")
- Smart Positioning: Place Bates numbers at any corner, center, or custom location on the page
- Typography Control: Choose fonts, sizes, colors, and styles (bold/italic) or upload custom TTF/OTF fonts
- Date Stamping: Include optional timestamps with configurable formatting
- Logo Integration: Upload and position logos (SVG, PNG, JPG, WEBP) on separator pages
- QR Code Generation: Embed scannable QR codes containing Bates numbers for digital tracking
- Watermark Support: Add custom text overlays with opacity and rotation control
- Border Styling: Four decorative border styles for separator pages (solid, dashed, double, asterisks)
Workflow Automation
- Bates-Based Filenames: Automatically name output files using their first Bates number
- Mapping Files: Generate CSV and PDF cross-reference documents linking original names to Bates numbers
- ZIP Downloads: Bundle all processed files into a single archive for easy distribution
- Progress Tracking: Real-time status updates with cancellation support for large batches
Who Benefits from Bates-Labeler?
Legal Professionals
Attorneys, Paralegals, and Legal Secretaries: Process discovery documents, prepare exhibits, and manage case files with professional-grade tools at zero cost. Perfect for depositions, trial exhibits, and document productions.
Law Firms of All Sizes
Solo Practitioners to Enterprise Firms: Small firms gain access to enterprise features without enterprise costs, while larger firms can deploy unlimited instances without per-seat licensing fees. Self-host for complete data control.
Corporate Legal Departments
In-House Counsel Teams: Manage internal investigations, regulatory compliance documents, and litigation support without external software costs or data privacy concerns. Deploy on-premises for sensitive matters.
eDiscovery and Legal Tech Vendors
Service Providers and Software Companies: Integrate Bates numbering into your existing platforms, white-label the solution, or use as a foundation for custom legal tech products. MIT license permits commercial use.
Developers and IT Professionals
Legal Tech Builders: Fork the codebase, contribute features, or learn from well-documented Python code. Perfect for building custom document management systems or automating legal workflows.
Academic and Non-Profit Organizations
Law Schools and Legal Aid: Teach document management practices, support pro bono cases, and provide students with professional tools without budget constraints.
How It Works
Web Interface Workflow
The Streamlit-based web interface makes professional Bates numbering accessible to anyone:
- Launch the Application: Run locally with
poetry run streamlit run app.pyor deploy to cloud platforms - Upload Documents: Drag and drop PDFs or browse to select files – supports single or batch processing
- Choose Configuration: Select from presets (Legal Discovery, Confidential, Exhibit) or customize every detail
- Preview Settings: See real-time preview of how your Bates numbers will appear
- Process Files: Click "Process PDFs" and watch real-time progress with cancellation option
- Download Results: Get individual files or bundled ZIP archive with optional mapping documents
Command-Line Workflow
For developers and automation enthusiasts, the CLI provides powerful scripting capabilities:
# Single file with custom prefix
poetry run bates --input document.pdf --bates-prefix "CASE2024-"
# Batch processing with continuous numbering
poetry run bates --batch *.pdf --bates-prefix "DISCOVERY-" --start-number 1000
# Combine multiple PDFs with index and separators
poetry run bates --batch doc1.pdf doc2.pdf doc3.pdf \
--combine --document-separators --add-index \
--bates-prefix "PROD-" --output combined.pdf
# Custom formatting with date stamps
poetry run bates --input deposition.pdf \
--bates-prefix "DEP-" --include-date \
--position top-right --font-color red --bold
Technology Stack
Core Architecture
- Python 3.9+: Modern Python with type hints and async support
- pypdf ^4.0.0: Actively maintained successor to PyPDF2 for robust PDF manipulation
- reportlab ^4.0.7: Industry-standard PDF generation with precise layout control
- tqdm ^4.66.1: Professional progress bars for batch processing feedback
Web Interface
- Streamlit ^1.28.0: Rapid development of beautiful, responsive web applications
- Custom CSS: Professional styling with 420px sidebar and collapsible sections
- Real-time Reactivity: Instant preview updates as users adjust settings
Development Tooling
- Poetry: Modern dependency management and packaging for reproducible builds
- pytest: Comprehensive unit testing with 100% test pass rate
- black, flake8, mypy: Code formatting, linting, and type checking for quality assurance
- Docker: Containerized deployment for consistent environments
Deployment Options
- Local Development: Run on any machine with Python 3.9+
- Self-Hosted: Deploy on internal servers for complete data control
- Streamlit Cloud: Free cloud hosting with one-click deployment
- Docker Containers: Consistent deployment across any infrastructure
- Cloud Platforms: Compatible with AWS, Google Cloud, Azure, and DigitalOcean
Open Source Philosophy
Why Open Source Matters for Legal Tech
Legal technology should be transparent, accessible, and community-driven. Bates-Labeler embodies these principles:
- MIT License: Use freely for personal, commercial, or academic purposes with no restrictions
- Transparent Development: Every line of code is publicly visible on GitHub for security audits and learning
- Community Contributions: Issues, pull requests, and feature suggestions welcome from all skill levels
- No Vendor Lock-In: Own your tools, modify them as needed, and never worry about subscription changes
- Educational Resource: Well-documented codebase serves as learning material for legal tech developers
- Future-Proof: Even if original development stops, the community can maintain and extend indefinitely
- Data Privacy: Self-hosting ensures complete control over sensitive legal documents
- Cost Savings: Zero licensing fees enable budget reallocation to other critical needs
Contributing to the Project
Bates-Labeler thrives on community involvement. Whether you're a developer, legal professional, or documentation writer, there are many ways to contribute:
- Report bugs or request features through GitHub Issues
- Submit pull requests with code improvements or new features
- Improve documentation and create tutorials
- Share use cases and success stories
- Star the repository to show support and increase visibility
Real-World Use Cases
Legal Discovery Production
poetry run bates --batch discovery/*.pdf \
--bates-prefix "PLAINTIFF-PROD-" \
--start-number 1 --padding 6 \
--position bottom-right \
--output-dir ./productions/
Process hundreds of documents for discovery production with standardized plaintiff/defendant prefixes and sequential numbering starting at 000001.
Confidential Document Marking
poetry run bates --input trade_secrets.pdf \
--bates-prefix "CONFIDENTIAL-" \
--bates-suffix "-AEO" \
--font-color red --bold \
--position top-center
Mark sensitive documents with prominent red "CONFIDENTIAL" Bates numbers and "Attorneys' Eyes Only" suffixes for restricted handling.
Trial Exhibit Preparation
poetry run bates --batch exhibits/*.pdf \
--bates-prefix "EXHIBIT-" \
--start-number 101 --padding 3 \
--combine --document-separators \
--add-index \
--output trial_exhibits.pdf
Combine multiple exhibits into a single PDF with separator pages, professional index, and sequential exhibit numbering starting at 101.
Automated Production Pipeline
#!/bin/bash
# Production automation script
for case in cases/*/; do
poetry run bates --batch "$case"/*.pdf \
--bates-prefix "$(basename "$case")-" \
--bates-filenames \
--output-dir "productions/$(basename "$case")/"
done
Integrate Bates-Labeler into automated workflows for processing multiple cases with minimal manual intervention.
Future Development Roadmap
Completed Features (v1.1.0)
- ✅ Professional Streamlit web interface
- ✅ Poetry-based packaging and dependency management
- ✅ Docker containerization support
- ✅ Custom font upload (TTF/OTF)
- ✅ Logo integration with multiple placement options
- ✅ QR code generation for digital tracking
- ✅ Watermark capabilities with opacity control
- ✅ PDF combining with continuous numbering
- ✅ Automated index page generation
- ✅ Comprehensive test suite with 100% pass rate
Planned Enhancements
The roadmap includes exciting features that leverage AI and modern legal tech practices:
Phase 1: Document Intelligence
- OCR support for scanned documents using Tesseract
- Automatic document classification by type
- Intelligent boundary detection in combined PDFs
- Smart metadata extraction (case numbers, dates, parties)
Phase 2: Quality Assurance
- AI-powered continuity verification
- Duplicate and near-duplicate detection
- Auto-suggest Bates prefixes based on content
- PII redaction detection and suggestions
Phase 3: Search & Discovery
- Full-text searchable index generation
- Semantic search capabilities
- Named entity recognition and indexing
- AI document summarization
Phase 4: Enhanced UX
- Natural language configuration ("Number these as plaintiff production documents")
- AI assistant for troubleshooting and optimization
- Smart defaults based on usage patterns
- Workflow template suggestions
Phase 5: Advanced Automation
- Automatic document routing and organization
- Batch processing optimization recommendations
- Anomaly detection and alerting
- Integration with LLMs via Model Context Protocol (MCP)
Privacy-First AI Implementation
Future AI features will support both local and cloud options:
- Local AI: Tesseract, spaCy, and Ollama for sensitive legal documents
- Cloud AI: Optional integration with OpenAI, Claude, and Google Cloud Vision
- Hybrid Architecture: Flexible design supporting mixed deployment
- MCP Integration: Modular tool connectivity for extensibility
Perfect For
Bates-Labeler is the ideal solution for anyone needing professional PDF numbering capabilities:
- Legal document management and eDiscovery
- Litigation support and trial preparation
- Regulatory compliance and document production
- Court exhibit preparation and filing
- Discovery response automation
- Document tracking and organization
- Legal technology development and integration
- Pro bono legal services and legal aid
- Law school clinical programs
- Corporate legal department operations
Why Choose Bates-Labeler?
vs. Commercial Solutions
| Feature | Bates-Labeler | Commercial Software |
|---|---|---|
| Cost | Free (MIT License) | $500-5,000+ annually |
| Source Code | Open & Auditable | Proprietary Black Box |
| Customization | Fully Modifiable | Limited to Vendor Options |
| Data Privacy | Self-Hosted Available | Cloud Upload Required |
| Deployment | On-Premises or Cloud | Vendor-Controlled Only |
| Updates | Community-Driven | Subscription-Dependent |
| Lock-In | None | Subscription Required |
| Support | Community + GitHub | Paid Support Tiers |
Getting Started
Quick Installation
# Clone the repository
git clone https://github.com/thepingdoctor/Bates-Labeler.git
cd Bates-Labeler
# Install with Poetry (recommended)
poetry install
# Launch web interface
poetry run streamlit run app.py
# Or use CLI directly
poetry run bates --input document.pdf --bates-prefix "CASE-"
Docker Deployment
# Build the image
docker build -t bates-labeler .
# Run the container
docker run -p 8501:8501 bates-labeler
# Access at http://localhost:8501
System Requirements
- Python 3.9 or higher (except 3.9.7)
- 4GB RAM minimum (8GB recommended for large batches)
- Any operating system (macOS, Linux, Windows)
- Modern web browser for UI (Chrome, Firefox, Safari, Edge)
Comprehensive Documentation
Bates-Labeler includes extensive documentation to help you get started and master advanced features:
- README.md: Complete guide with quick start, installation, and usage examples
- WEB_UI_GUIDE.md: Step-by-step walkthrough of the Streamlit interface
- PACKAGING.md: Developer guide for Poetry, testing, and PyPI publishing
- Code Comments: Thoroughly documented source code for learning and extending
- Example Scripts: Real-world usage patterns and automation examples
Community Feedback
"As a solo practitioner, I couldn't justify $3,000/year for commercial Bates software. Bates-Labeler gave me the same features for free. It's changed my practice."
— Solo Attorney, Family Law Practice
"We integrated Bates-Labeler into our document management system. The MIT license and clean codebase made it perfect for our needs."
— CTO, Legal Tech Startup
"Being able to self-host was critical for our confidential corporate matters. Bates-Labeler gave us control without compromise."
— General Counsel, Fortune 500 Company
Start Using Bates-Labeler Today
Join the growing community of legal professionals, developers, and organizations leveraging open-source technology for professional document management. Whether you're processing a single document or building an enterprise workflow, Bates-Labeler provides the tools you need without the enterprise price tag.
Need Custom Development?
If you require custom features, enterprise support, or integration assistance for Bates-Labeler or similar legal technology solutions, I'm available for consulting and development work. Let's discuss how I can help streamline your legal workflows.
Contact Me for Custom Solutions