Structured Extraction
How SubOps turns uploaded documents into source-linked data, and where AI is used only for the messy edges.
Last updated
import { Cards, Card } from "fumadocs-ui/components/card";
Structured Extraction
SubOps is deterministic-first. Standard settlement PDFs, payroll CSVs, and fuel CSVs are parsed through structured pipelines whenever the layout can be validated. AI is reserved for the messy edges: unknown layouts, degraded scans, variable Contract Pay Table schedules, repair invoices, DVIR text, and plain-English narrative drafting.
What happens by default
| Artifact | Default path | AI role |
|---|---|---|
| Standard settlement PDF | Layout fingerprint → deterministic parser → checksum validation | Fallback only when the layout is unknown or validation fails |
| Charge and Credit detail | Deterministic parser with source references | Fallback only |
| Payroll CSV / fuel CSV | Deterministic CSV parser | None |
| Contract Pay Table / rate schedule | Deterministic table extraction when the schedule layout is known; structured extraction for variable schedules | Draft facts only until an operator or bookkeeper confirms the rates |
| Repair invoice | Structured extraction with OCR/AI | Appropriate because shop invoice layouts vary widely |
| DVIR note or photo | Deterministic when structured; OCR/AI for handwritten or photo evidence | Defect text interpretation only |
Where AI is used
| Capability | What AI does | How it's constrained |
|---|---|---|
| Unknown settlement layout | Bootstraps a structured parse when the layout fingerprint is not recognized | Output must pass schema and checksum validation before rules can use it |
| Variable contract/rate schedule | Reads Schedule C / Attachment C-1, amendments, and rate-card tables when deterministic extraction cannot prove the fields | Extracted rates are draft facts. They do not become contract truth until confirmed |
| Repair invoice OCR | Reads scanned or digital repair invoices and extracts parts, labor hours, rates, and totals | Extraction runs through schema validation; every extracted dollar amount is recalculated deterministically |
| DVIR text interpretation | Interprets free-text or photo-based defect notes | Findings remain tied to vehicle, date, and uploaded evidence |
| Narrative drafting | AI drafts the plain-English summary in the Owner Brief | The LLM chooses words; it does not choose numbers. Dollar values are injected from the deterministic math layer |
Where AI is never used
| Capability | What runs instead | Why |
|---|---|---|
| Fitment engine | Deterministic rule-based matching | F1 = 1.0 on ground-truth corpus. No AI needed — and no AI hallucination risk — when matching VINs to part SKUs |
| Rules library | TypeScript deterministic rule executor (R001–R025, FPM-001–FPM-014) | Every rule output is reproducible. Same input → same finding. AI cannot guarantee this |
| Dollar math | TypeScript arithmetic, pinned to extracted values and contractual rates | Every user-facing dollar figure is traceable to a source document. LLMs hallucinate numbers; TypeScript doesn't |
| Dispute workflow | State machine with explicit transitions | Disputes have contractual deadlines. No probabilistic state tracking |
Key point: Structured parsers handle the predictable work. AI handles the messy edges. Deterministic TypeScript owns every dollar.
Confidence and review
When a structured parse validates, findings can proceed through the rules library. When a parser cannot prove the result, the artifact routes through fallback extraction or manual review before it enters the rules pipeline. You can adjust review thresholds per tenant.
Note: SubOps does not train models on customer data. Extraction runs against foundation models via API. Your settlement data is not used to improve anyone else's AI.