AssayPDF AssayPDF

The benchmark the print industry never had

AssayPDF is an open-source benchmark kit for PDF preflight engines — a deterministic GWG 2022 test corpus, a uniform harness, and reproducible accuracy reports. MIT-licensed end to end.

Why AssayPDF exists

The Ghent Workgroup’s 2015 Compliancy Test Suite — the industry’s de facto preflight benchmark for the last decade — is gated to GWG vendor members. Buyers can’t run it. Open-source tools can’t be measured against it. And the new GWG 2022 specification ships with no public test corpus at all.

AssayPDF closes that gap. It encodes 39 rules across the 23 GWG 2022 variants into a byte-deterministic corpus of ~175 PDFs (23 positive baselines plus 152 negative failure-mode files), runs that corpus through callas pdfToolbox, Enfocus PitStop Server, and lintPDF under a single harness, and emits TP / FP / FN / TN scores per rule, per variant, per engine.

Reproducible by construction

Every byte of the AssayPDF run is auditable. Vendor assets are downloaded from GWG canonical URLs and pinned by SHA-256. The corpus is generated deterministically from a seed — clone the repo, run the same seed, get the same 175 PDFs down to the byte.

Every generated PDF is then validated against PDF/X-4 (ISO 15930-7) with verapdf before it enters the corpus, so the baseline is conformant by construction. The harness runs each engine against the same files in the same order, and the report renderer produces identical markdown and HTML output across machines.

That reproducibility is the whole point. Closed benchmarks can’t be challenged. Open benchmarks can be re-run, audited, and improved by anyone in the community.

The CLI surface

One assay binary covers the full pipeline. Pull vendor assets, ingest the GWG spec, generate the corpus, run engines, score, report.

fetch

Pull GWG canonical assets with SHA-256 verification.

ingest

Normalize the GWG 2022 spec into the rule × variant matrix.

generate

Emit the deterministic ~175-file corpus from a seed.

benchmark

Run the corpus through callas, PitStop, and lintPDF under one harness.

report

Render TP / FP / FN / TN scores into markdown and HTML.

validate

verapdf check every generated PDF against PDF/X-4.

Print with Synergy

AssayPDF is built by Print with Synergy, a product studio focused on developer tools for the print and publishing industry.

AssayPDF also serves as the credibility layer for Print with Synergy’s lintPDF SaaS. By publishing reproducible accuracy comparisons against the incumbents, lintPDF can stand on measurable ground rather than marketing copy. The same benchmark is open to every other engine in the field.

Standards we report against

AssayPDF doesn’t invent rules. It encodes the standards the industry has already agreed on and measures engines against them.

GWG 2022

The Ghent Workgroup 2022 specification — 39 rules across 23 variants, encoded into the AssayPDF rule matrix.

ISO 15930-7 (PDF/X-4)

Every generated PDF in the corpus is verapdf-validated against PDF/X-4 before it’s scored.

verapdf

The PDF Association’s open-source PDF/A and PDF/X validator. AssayPDF uses verapdf as the conformance check on corpus generation.

SHA-256 provenance

Vendor inputs are pinned by hash. Two contributors on two continents get bit-for-bit identical inputs.

Run the benchmark yourself

Clone the repo, run the seed, get the corpus. Pipe it through your engine of choice and read the report. No vendor membership required.

View on GitHub