Bates-Labeler: Open-Source PDF Numbering for Legal Discovery

Open-source Python tool for adding Bates numbers to PDF documents, designed for legal professionals. Features both an intuitive Streamlit web interface and powerful CLI with support for batch processing, custom formatting, logos, QR codes, and watermarks.

Try It Now
Bates-Labeler: Open-Source PDF Numbering for Legal Discovery

Bates-Labeler: Democratizing Legal Document Management

Legal discovery and document management require precise Bates numbering – a standardized system for uniquely identifying pages in legal documents. Traditional solutions are often expensive proprietary software packages that cost thousands of dollars annually, putting professional-grade tools out of reach for solo practitioners, small firms, and legal tech startups.

I developed Bates-Labeler as a completely free, open-source alternative that delivers enterprise-grade functionality without the enterprise price tag. Licensed under MIT, this Python-based tool provides both an intuitive web interface for non-technical users and a powerful command-line interface for automation and integration, making professional legal document numbering accessible to everyone.

With over 1,000 lines of carefully crafted Python code, comprehensive documentation, and a growing community of contributors, Bates-Labeler represents the future of open-source legal technology.

Key Features

Dual Interface Design

  • Streamlit Web UI: Professional drag-and-drop interface requiring zero technical knowledge
  • Command-Line Interface: Powerful CLI for automation, scripting, and integration with existing workflows
  • Configuration Presets: Pre-built templates for Legal Discovery, Confidential Documents, and Exhibits
  • Real-Time Preview: See exactly how your Bates numbers will appear before processing

Advanced Document Processing

  • Batch Processing: Handle multiple PDFs simultaneously with continuous numbering across documents
  • Password Protection Support: Process encrypted PDFs with secure password handling
  • PDF Combining: Merge multiple documents into a single file with uninterrupted Bates sequences
  • Index Generation: Automatically create professional tables of contents for combined productions
  • Separator Pages: Insert branded divider pages between documents with Bates range summaries

Professional Customization

  • Flexible Formatting: Custom prefixes, suffixes, padding, and starting numbers (e.g., "PLAINTIFF-PROD-000001")
  • Smart Positioning: Place Bates numbers at any corner, center, or custom location on the page
  • Typography Control: Choose fonts, sizes, colors, and styles (bold/italic) or upload custom TTF/OTF fonts
  • Date Stamping: Include optional timestamps with configurable formatting
  • Logo Integration: Upload and position logos (SVG, PNG, JPG, WEBP) on separator pages
  • QR Code Generation: Embed scannable QR codes containing Bates numbers for digital tracking
  • Watermark Support: Add custom text overlays with opacity and rotation control
  • Border Styling: Four decorative border styles for separator pages (solid, dashed, double, asterisks)

Workflow Automation

  • Bates-Based Filenames: Automatically name output files using their first Bates number
  • Mapping Files: Generate CSV and PDF cross-reference documents linking original names to Bates numbers
  • ZIP Downloads: Bundle all processed files into a single archive for easy distribution
  • Progress Tracking: Real-time status updates with cancellation support for large batches

Who Benefits from Bates-Labeler?

Legal Professionals

Attorneys, Paralegals, and Legal Secretaries: Process discovery documents, prepare exhibits, and manage case files with professional-grade tools at zero cost. Perfect for depositions, trial exhibits, and document productions.

Law Firms of All Sizes

Solo Practitioners to Enterprise Firms: Small firms gain access to enterprise features without enterprise costs, while larger firms can deploy unlimited instances without per-seat licensing fees. Self-host for complete data control.

Corporate Legal Departments

In-House Counsel Teams: Manage internal investigations, regulatory compliance documents, and litigation support without external software costs or data privacy concerns. Deploy on-premises for sensitive matters.

eDiscovery and Legal Tech Vendors

Service Providers and Software Companies: Integrate Bates numbering into your existing platforms, white-label the solution, or use as a foundation for custom legal tech products. MIT license permits commercial use.

Developers and IT Professionals

Legal Tech Builders: Fork the codebase, contribute features, or learn from well-documented Python code. Perfect for building custom document management systems or automating legal workflows.

Academic and Non-Profit Organizations

Law Schools and Legal Aid: Teach document management practices, support pro bono cases, and provide students with professional tools without budget constraints.

How It Works

Web Interface Workflow

The Streamlit-based web interface makes professional Bates numbering accessible to anyone:

  1. Launch the Application: Run locally with poetry run streamlit run app.py or deploy to cloud platforms
  2. Upload Documents: Drag and drop PDFs or browse to select files – supports single or batch processing
  3. Choose Configuration: Select from presets (Legal Discovery, Confidential, Exhibit) or customize every detail
  4. Preview Settings: See real-time preview of how your Bates numbers will appear
  5. Process Files: Click "Process PDFs" and watch real-time progress with cancellation option
  6. Download Results: Get individual files or bundled ZIP archive with optional mapping documents

Command-Line Workflow

For developers and automation enthusiasts, the CLI provides powerful scripting capabilities:

# Single file with custom prefix
poetry run bates --input document.pdf --bates-prefix "CASE2024-"

# Batch processing with continuous numbering
poetry run bates --batch *.pdf --bates-prefix "DISCOVERY-" --start-number 1000

# Combine multiple PDFs with index and separators
poetry run bates --batch doc1.pdf doc2.pdf doc3.pdf \
  --combine --document-separators --add-index \
  --bates-prefix "PROD-" --output combined.pdf

# Custom formatting with date stamps
poetry run bates --input deposition.pdf \
  --bates-prefix "DEP-" --include-date \
  --position top-right --font-color red --bold

Technology Stack

Core Architecture

  • Python 3.9+: Modern Python with type hints and async support
  • pypdf ^4.0.0: Actively maintained successor to PyPDF2 for robust PDF manipulation
  • reportlab ^4.0.7: Industry-standard PDF generation with precise layout control
  • tqdm ^4.66.1: Professional progress bars for batch processing feedback

Web Interface

  • Streamlit ^1.28.0: Rapid development of beautiful, responsive web applications
  • Custom CSS: Professional styling with 420px sidebar and collapsible sections
  • Real-time Reactivity: Instant preview updates as users adjust settings

Development Tooling

  • Poetry: Modern dependency management and packaging for reproducible builds
  • pytest: Comprehensive unit testing with 100% test pass rate
  • black, flake8, mypy: Code formatting, linting, and type checking for quality assurance
  • Docker: Containerized deployment for consistent environments

Deployment Options

  • Local Development: Run on any machine with Python 3.9+
  • Self-Hosted: Deploy on internal servers for complete data control
  • Streamlit Cloud: Free cloud hosting with one-click deployment
  • Docker Containers: Consistent deployment across any infrastructure
  • Cloud Platforms: Compatible with AWS, Google Cloud, Azure, and DigitalOcean

Open Source Philosophy

Why Open Source Matters for Legal Tech

Legal technology should be transparent, accessible, and community-driven. Bates-Labeler embodies these principles:

  • MIT License: Use freely for personal, commercial, or academic purposes with no restrictions
  • Transparent Development: Every line of code is publicly visible on GitHub for security audits and learning
  • Community Contributions: Issues, pull requests, and feature suggestions welcome from all skill levels
  • No Vendor Lock-In: Own your tools, modify them as needed, and never worry about subscription changes
  • Educational Resource: Well-documented codebase serves as learning material for legal tech developers
  • Future-Proof: Even if original development stops, the community can maintain and extend indefinitely
  • Data Privacy: Self-hosting ensures complete control over sensitive legal documents
  • Cost Savings: Zero licensing fees enable budget reallocation to other critical needs

Contributing to the Project

Bates-Labeler thrives on community involvement. Whether you're a developer, legal professional, or documentation writer, there are many ways to contribute:

  • Report bugs or request features through GitHub Issues
  • Submit pull requests with code improvements or new features
  • Improve documentation and create tutorials
  • Share use cases and success stories
  • Star the repository to show support and increase visibility

Real-World Use Cases

Legal Discovery Production

poetry run bates --batch discovery/*.pdf \
  --bates-prefix "PLAINTIFF-PROD-" \
  --start-number 1 --padding 6 \
  --position bottom-right \
  --output-dir ./productions/

Process hundreds of documents for discovery production with standardized plaintiff/defendant prefixes and sequential numbering starting at 000001.

Confidential Document Marking

poetry run bates --input trade_secrets.pdf \
  --bates-prefix "CONFIDENTIAL-" \
  --bates-suffix "-AEO" \
  --font-color red --bold \
  --position top-center

Mark sensitive documents with prominent red "CONFIDENTIAL" Bates numbers and "Attorneys' Eyes Only" suffixes for restricted handling.

Trial Exhibit Preparation

poetry run bates --batch exhibits/*.pdf \
  --bates-prefix "EXHIBIT-" \
  --start-number 101 --padding 3 \
  --combine --document-separators \
  --add-index \
  --output trial_exhibits.pdf

Combine multiple exhibits into a single PDF with separator pages, professional index, and sequential exhibit numbering starting at 101.

Automated Production Pipeline

#!/bin/bash
# Production automation script
for case in cases/*/; do
  poetry run bates --batch "$case"/*.pdf \
    --bates-prefix "$(basename "$case")-" \
    --bates-filenames \
    --output-dir "productions/$(basename "$case")/"
done

Integrate Bates-Labeler into automated workflows for processing multiple cases with minimal manual intervention.

Future Development Roadmap

Completed Features (v1.1.0)

  • ✅ Professional Streamlit web interface
  • ✅ Poetry-based packaging and dependency management
  • ✅ Docker containerization support
  • ✅ Custom font upload (TTF/OTF)
  • ✅ Logo integration with multiple placement options
  • ✅ QR code generation for digital tracking
  • ✅ Watermark capabilities with opacity control
  • ✅ PDF combining with continuous numbering
  • ✅ Automated index page generation
  • ✅ Comprehensive test suite with 100% pass rate

Planned Enhancements

The roadmap includes exciting features that leverage AI and modern legal tech practices:

Phase 1: Document Intelligence

  • OCR support for scanned documents using Tesseract
  • Automatic document classification by type
  • Intelligent boundary detection in combined PDFs
  • Smart metadata extraction (case numbers, dates, parties)

Phase 2: Quality Assurance

  • AI-powered continuity verification
  • Duplicate and near-duplicate detection
  • Auto-suggest Bates prefixes based on content
  • PII redaction detection and suggestions

Phase 3: Search & Discovery

  • Full-text searchable index generation
  • Semantic search capabilities
  • Named entity recognition and indexing
  • AI document summarization

Phase 4: Enhanced UX

  • Natural language configuration ("Number these as plaintiff production documents")
  • AI assistant for troubleshooting and optimization
  • Smart defaults based on usage patterns
  • Workflow template suggestions

Phase 5: Advanced Automation

  • Automatic document routing and organization
  • Batch processing optimization recommendations
  • Anomaly detection and alerting
  • Integration with LLMs via Model Context Protocol (MCP)

Privacy-First AI Implementation

Future AI features will support both local and cloud options:

  • Local AI: Tesseract, spaCy, and Ollama for sensitive legal documents
  • Cloud AI: Optional integration with OpenAI, Claude, and Google Cloud Vision
  • Hybrid Architecture: Flexible design supporting mixed deployment
  • MCP Integration: Modular tool connectivity for extensibility

Perfect For

Bates-Labeler is the ideal solution for anyone needing professional PDF numbering capabilities:

  • Legal document management and eDiscovery
  • Litigation support and trial preparation
  • Regulatory compliance and document production
  • Court exhibit preparation and filing
  • Discovery response automation
  • Document tracking and organization
  • Legal technology development and integration
  • Pro bono legal services and legal aid
  • Law school clinical programs
  • Corporate legal department operations

Why Choose Bates-Labeler?

vs. Commercial Solutions

Feature Bates-Labeler Commercial Software
Cost Free (MIT License) $500-5,000+ annually
Source Code Open & Auditable Proprietary Black Box
Customization Fully Modifiable Limited to Vendor Options
Data Privacy Self-Hosted Available Cloud Upload Required
Deployment On-Premises or Cloud Vendor-Controlled Only
Updates Community-Driven Subscription-Dependent
Lock-In None Subscription Required
Support Community + GitHub Paid Support Tiers

Getting Started

Quick Installation

# Clone the repository
git clone https://github.com/thepingdoctor/Bates-Labeler.git
cd Bates-Labeler

# Install with Poetry (recommended)
poetry install

# Launch web interface
poetry run streamlit run app.py

# Or use CLI directly
poetry run bates --input document.pdf --bates-prefix "CASE-"

Docker Deployment

# Build the image
docker build -t bates-labeler .

# Run the container
docker run -p 8501:8501 bates-labeler

# Access at http://localhost:8501

System Requirements

  • Python 3.9 or higher (except 3.9.7)
  • 4GB RAM minimum (8GB recommended for large batches)
  • Any operating system (macOS, Linux, Windows)
  • Modern web browser for UI (Chrome, Firefox, Safari, Edge)

Comprehensive Documentation

Bates-Labeler includes extensive documentation to help you get started and master advanced features:

  • README.md: Complete guide with quick start, installation, and usage examples
  • WEB_UI_GUIDE.md: Step-by-step walkthrough of the Streamlit interface
  • PACKAGING.md: Developer guide for Poetry, testing, and PyPI publishing
  • Code Comments: Thoroughly documented source code for learning and extending
  • Example Scripts: Real-world usage patterns and automation examples

Community Feedback

"As a solo practitioner, I couldn't justify $3,000/year for commercial Bates software. Bates-Labeler gave me the same features for free. It's changed my practice."

— Solo Attorney, Family Law Practice

"We integrated Bates-Labeler into our document management system. The MIT license and clean codebase made it perfect for our needs."

— CTO, Legal Tech Startup

"Being able to self-host was critical for our confidential corporate matters. Bates-Labeler gave us control without compromise."

— General Counsel, Fortune 500 Company

Start Using Bates-Labeler Today

Join the growing community of legal professionals, developers, and organizations leveraging open-source technology for professional document management. Whether you're processing a single document or building an enterprise workflow, Bates-Labeler provides the tools you need without the enterprise price tag.

Need Custom Development?

If you require custom features, enterprise support, or integration assistance for Bates-Labeler or similar legal technology solutions, I'm available for consulting and development work. Let's discuss how I can help streamline your legal workflows.

Contact Me for Custom Solutions