AssayPDF AssayPDF

Open source

Four repos. One family.

Benchmark, preflight, and view PDFs without locking yourself into one vendor’s monolith. Pick what you need; ignore what you don’t.

assay-pdf

Beta

Open-source GWG 2022 benchmark kit for PDF preflight engines.

assay-pdf generates a deterministic ~175-file GWG 2022 test corpus, runs it through a uniform harness against callas pdfToolbox, Enfocus PitStop Server, and lintPDF, and scores TP/FP/FN/TN per rule, per variant, per engine. Reproducible markdown / HTML accuracy reports replace the vendor-gated GWG 2015 Compliancy Test Suite.

  • 39 rules × 23 GWG 2022 variants → ~175 PDFs (23 positive baselines, 152 negative failure modes)
  • Byte-deterministic corpus generation from a seed; SHA-256 verified vendor assets
  • Uniform harness runs callas pdfToolbox, Enfocus PitStop Server, and lintPDF
  • TP/FP/FN/TN scoring per rule, per variant, per engine
  • Every generated PDF verified against PDF/X-4 (ISO 15930-7) with verapdf

lint-pdf

Beta

Detection-only PDF preflight engine — 500+ checks plus the PDF/X-4 conformance suite.

lint-pdf is the open-source preflight engine. It inspects PDFs against 500+ checks across fonts, color, images, transparency, page geometry, and packaging — plus a 91-check PDF/X-4 conformance suite (ISO 15930-7). Detection-only by design: your originals are never modified.

  • 500+ engine checks + PDF/X-4 (ISO 15930-7) conformance suite
  • Built-in rulesets for GWG sheetfed, GWG digital, PDF/X-4, packaging
  • External imports from PitStop, callas pdfToolbox, Acrobat
  • FastAPI service + CLI + Python SDK
  • Detection-only: your files are never modified

loupe-pdf

Beta

Embeddable PDF viewer with separations, TAC, layers, and annotation overlays.

loupe-pdf is an embeddable web PDF viewer purpose-built for prepress review. It surfaces ink separations, total-area-coverage maps, layer toggles, a single-pixel densitometer, and annotation overlays — the things a prepress operator actually needs when looking at a job.

  • Per-channel ink separations (CMYK + spots)
  • TAC heatmap and densitometer probe
  • Layer toggles + annotation overlays
  • Embeddable in any Next.js / React host
  • Plugin slots for custom toolbars and panels

codex-pdf

Beta

Structured PDF extraction API that turns complex files into consistent JSON.

codex-pdf is an open-source PDF extraction API that converts complex PDF documents into structured, consistent JSON. It handles text layers, tables, form fields, images, and metadata — turning opaque files into data pipelines can actually consume.

  • Structured JSON output for text, tables, and document metadata
  • Form field and annotation extraction
  • Image and embedded asset cataloging
  • FastAPI service + CLI + Python SDK
  • Handles complex layouts: columns, sidebars, headers, and footers