CodexPDF CodexPDF

Features

Everything CodexPDF provides

Built for teams that need an authoritative facts contract for PDFs, not duplicated extraction logic across services.

Boundary clarity

Read-only extraction by design

codexPDF focuses on facts extraction and avoids hidden product behavior.

  • No rendering mutations or rule adjudication in extraction.

  • Consumer-agnostic payload shape for all callers.

  • Stable extraction boundary for long-term system evolution.

Contract-first model

CodexDocument as the source of truth

Downstream systems consume one shared facts contract.

  • Versioned CodexDocument root model.

  • Schema files published under schemas/v1.

  • SemVer policy aligns schema evolution with runtime usage.

Validation and trust

Schema-validated workflows

Validate outputs in local tooling and CI to catch drift early.

  • CLI validation against published schema bundles.

  • Portable JSON payloads for reproducible checks.

  • Contract-aware output for downstream adapters.

Operational tooling

CLI and parity built in

Designed for real workflows beyond one-off extraction.

  • extract/probe for raw and lightweight summaries.

  • parity profiles for baseline comparison.

  • baseline command mode for external adapter checks.

Image intelligence · v1.17.0

Effective Image DPI

Placement-aware resolution for every image on every page.

  • Captures effective_resolution_dpi using the actual placed rect from the PDF — not a page-size estimate.

  • A 300 DPI image enlarged 2× correctly reports 150 DPI; shrunk to 0.5× reports 600 DPI.

  • Reported per-placement across all pages so downstream tools can flag specific occurrences, not just file-level averages.