Skip to main content

Contract Generator: Metadata-Safe PDF Generation with Python, Streamlit & ReportLab

A Python application that generates execution-ready PDF contracts from scratch using Streamlit and ReportLab, eliminating Word metadata leaks and confidentiality risks.

Contract Generator: Metadata-Safe PDF Generation with Python, Streamlit & ReportLab

Fresh PDF Contracts: Building a Metadata-Safe Generator with Python, Streamlit & ReportLab

I needed a contract generator that actually solved a real problem instead of creating new ones. Most teams still live in Microsoft Word, copying templates, tweaking clauses, and shipping PDFs that quietly carry metadata from every previous client and version. That's not just sloppy — it's a confidentiality risk.

So I built a clean Python application that generates every execution-ready PDF from scratch. No legacy artifacts. No hidden revision history. Fresh documents, every single time.

The Core Problem: Word Metadata Leaks

When you open an old NDA or MSA, edit a few fields, and export to PDF, the underlying .docx often retains author names, previous client data, tracked changes, and embedded properties. Send that PDF and you've potentially exposed information you never intended to share.

This generator fixes it at the root. Every PDF is rendered fresh from code-defined legal text and user-supplied variables. There is no source document to pollute the output.

Tech Stack & How It's Built

Language & Packaging: Pure Python 3.10+, managed with Poetry. The package is contract-generator with a CLI entry point contract-generator. One poetry install and you're ready. Dependencies stay minimal and reproducible: Streamlit, ReportLab, Pillow, and qrcode.

UI: Streamlit multi-page application using st.navigation. Four focused pages right now — MNDA Generator, MSA Generator, SOW Generator, and Fractional CTO (which outputs a specialized SOW). Forms collect strongly-typed data via dataclasses (SecondParty, MSAParty, SOWParty, etc.). Shared sidebar controls handle visual toggles across every document.

PDF Engine: ReportLab with Platypus flowables plus custom canvas handling for headers, footers, page numbering, navy accent bars, classification badges, QR codes, and dual-column digital signature blocks. Professional typography and branding baked in without external template engines.

DocumentOptions dataclass gives you one place to control sidebar stripe, confidentiality badge, party summary block, document ID, dashed signature boxes, and QR code behavior for all document types.

Adding New Contract Templates Is Straightforward

Want a new agreement type? Add a new form module in forms/, define the input dataclass, drop the legal text into a dedicated _body.py module (or keep it co-located), and wire a generate_xxx_pdf() function that returns raw bytes.

Variables plug in cleanly. Conditional sections, auto-renumbering, fee model logic, MSP-dependent blocks, and deliverable tables are all handled in the renderer. Styling and branding flow through the shared options automatically. No brittle copy-paste between templates.

Running Locally or Hosting Securely

Local development is trivial:

poetry install
poetry run contract-generator

This launches the full Streamlit UI. You can also call the Python API directly for headless or scripted use.

For production, the pure Python + Streamlit stack deploys almost anywhere. Snowflake's Streamlit in Snowflake support makes it especially attractive for governed, secure environments where you want the app close to data and contracts without opening extra network paths. Containerize it and run behind your own auth on AWS, GCP, or internal infrastructure. The surface area stays small.

Planned LaTeX Integration

We're going to layer in LaTeX support. The goal is advanced TeX microtypography, the ability for legal authors to edit source .tex files directly when they need pixel-perfect control, and standardized cross-document print composition workflows centered on TeX. The current ReportLab foundation gives us a stable base; LaTeX becomes an optional high-fidelity path rather than a rewrite.

What This Actually Delivered

I'm proud of this one. It took a persistent, low-grade operational risk — accidental metadata leakage from constantly modified Word documents — and replaced it with a repeatable, auditable generation process. Every contract that leaves the system is created fresh for that moment. Previous client data doesn't ride along because there is no previous document instance in play.

The output quality is high enough for direct client execution. The architecture is simple enough that extending it doesn't require fighting legacy template cruft.

Integration & Future Possibilities

Because the generators return raw PDF bytes, integration is natural. Pipe the output straight into DocuSign, Dropbox Sign, PandaDoc, Agree.com or similar platforms via their APIs. Generate → send for signature in one flow, with the e-signature platform owning execution and storage.

Or extend the app itself to archive executed documents into secure S3-compatible buckets with proper retention and access controls. Add an API layer, build a full contract lifecycle wrapper, or embed generation inside larger internal tools. The foundation is deliberately open.

The possibilities are genuinely wide open once you stop fighting document metadata and start generating clean artifacts on demand.

Interested in Metadata-Safe Contract Generation?

This Contract Generator demonstrates how clean architecture can eliminate operational risk. If you're interested in learning more about this project or discussing how similar solutions could benefit your organization, please reach out.

Contact Me