The benchmark the print industry never had
AssayPDF is an open-source benchmark kit for PDF preflight engines — a deterministic GWG 2022 test corpus, a uniform harness, and reproducible accuracy reports. MIT-licensed end to end.
Why AssayPDF exists
The Ghent Workgroup’s 2015 Compliancy Test Suite — the industry’s de facto preflight benchmark for the last decade — is gated to GWG vendor members. Buyers can’t run it. Open-source tools can’t be measured against it. And the new GWG 2022 specification ships with no public test corpus at all.
AssayPDF closes that gap. It encodes 39 rules across the 23 GWG 2022 variants into a byte-deterministic corpus of ~175 PDFs (23 positive baselines plus 152 negative failure-mode files), runs that corpus through callas pdfToolbox, Enfocus PitStop Server, and lintPDF under a single harness, and emits TP / FP / FN / TN scores per rule, per variant, per engine.
Reproducible by construction
Every byte of the AssayPDF run is auditable. Vendor assets are downloaded from GWG canonical URLs and pinned by SHA-256. The corpus is generated deterministically from a seed — clone the repo, run the same seed, get the same 175 PDFs down to the byte.
Every generated PDF is then validated against PDF/X-4 (ISO 15930-7) with verapdf before it enters the corpus, so the baseline is conformant by construction. The harness runs each engine against the same files in the same order, and the report renderer produces identical markdown and HTML output across machines.
That reproducibility is the whole point. Closed benchmarks can’t be challenged. Open benchmarks can be re-run, audited, and improved by anyone in the community.
The CLI surface
One assay
binary covers the full pipeline. Pull vendor assets, ingest the
GWG spec, generate the corpus, run engines, score, report.
fetch
Pull GWG canonical assets with SHA-256 verification.
ingest
Normalize the GWG 2022 spec into the rule × variant matrix.
generate
Emit the deterministic ~175-file corpus from a seed.
benchmark
Run the corpus through callas, PitStop, and lintPDF under one harness.
report
Render TP / FP / FN / TN scores into markdown and HTML.
validate
verapdf check every generated PDF against PDF/X-4.
Print with Synergy
AssayPDF is built by Print with Synergy, a product studio focused on developer tools for the print and publishing industry.
AssayPDF also serves as the credibility layer for Print with Synergy’s lintPDF SaaS. By publishing reproducible accuracy comparisons against the incumbents, lintPDF can stand on measurable ground rather than marketing copy. The same benchmark is open to every other engine in the field.
Standards we report against
AssayPDF doesn’t invent rules. It encodes the standards the industry has already agreed on and measures engines against them.
GWG 2022
The Ghent Workgroup 2022 specification — 39 rules across 23 variants, encoded into the AssayPDF rule matrix.
ISO 15930-7 (PDF/X-4)
Every generated PDF in the corpus is verapdf-validated against PDF/X-4 before it’s scored.
verapdf
The PDF Association’s open-source PDF/A and PDF/X validator. AssayPDF uses verapdf as the conformance check on corpus generation.
SHA-256 provenance
Vendor inputs are pinned by hash. Two contributors on two continents get bit-for-bit identical inputs.
Run the benchmark yourself
Clone the repo, run the seed, get the corpus. Pipe it through your engine of choice and read the report. No vendor membership required.
View on GitHub