AssayPDF AssayPDF

title: "Usage walkthrough" description: "End-to-end walkthrough: fetch vendor assets, generate the deterministic PDF corpus, benchmark a preflight engine, render reports, and validate against verapdf." group: "Getting started" order: 3

Usage walkthrough

End-to-end: fetch → generate → benchmark → report → validate. For installation, see install.md. For per-flag CLI detail, see cli.md. For common errors, see troubleshooting.md.

1. Fetch vendor assets

uv run assay fetch                # add --force to re-download even if checksums match

Downloads the GWG 2022 spec docs, GOS 5.0 suites, and the Processing Steps Test Suite into vendor/. Each download is verified against vendor/checksums.json (SHA-256). Skipped on subsequent runs unless --force is passed.

2. Generate the corpus

uv run assay generate                                 # all 175 files
uv run assay generate --only-rule R0014               # just R0014 negatives
uv run assay generate --only-variant sheetcmyk-cmyk   # just one variant
uv run assay generate --seed 42                       # alternate deterministic seed

Writes PDFs into corpus/positive/ and corpus/negative/, plus corpus/manifest.json (per-file SHA-256 and expected outcome). Variant kebab names live in src/assay_pdf/generator/variants.py.

3. Benchmark an engine

uv run assay benchmark --engine pdftoolbox            # run all variants
uv run assay benchmark --engine pitstop --profile webcmyk-cmyk
uv run assay benchmark --engine lintpdf               # stub — emits warnings until the API ships

Each run writes both raw EngineResult JSON and a confusion-matrix *.score.json to results/. Engine selection requires the engine binary on PATH (or pointed to via env var — see reproducing.md). Exits 2 with a clear message if the runner isn't installed.

Aggregate output looks like:

✓ pdftoolbox score: TP=143 FP=2 FN=7 TN=3045 (12834ms aggregate runtime)

4. Render a report

uv run assay report --format md > REPORT.md
uv run assay report --format html --output REPORT.html

assay report aggregates every results/*.score.json it finds, so to compare engines run assay benchmark once per engine before rendering.

5. (Optional) Validate

uv run assay validate                # full verapdf PDF/X-4 walk
uv run assay validate --schema-only  # skip verapdf; check schemas only

Used in CI on every commit. Exits 1 if any corpus PDF fails verapdf.

Convenience shortcuts (Justfile)

just install                          # uv sync --all-extras
just check-deps                       # verify ghostscript/qpdf/mutool/exiftool/imagemagick/verapdf
just build                            # ingest → generate → validate
just bench pdftoolbox sheetcmyk-cmyk  # uv run assay benchmark --engine ... --profile ...
just report md                        # uv run assay report --format md

Run just --list for the full task list.