Learn
Product

Structured Extraction

How SubOps turns uploaded documents into source-linked data, and where AI is used only for the messy edges.

Last updated

import { Cards, Card } from "fumadocs-ui/components/card";

Structured Extraction

SubOps is deterministic-first. Standard settlement PDFs, payroll CSVs, and fuel CSVs are parsed through structured pipelines whenever the layout can be validated. AI is reserved for the messy edges: unknown layouts, degraded scans, variable Contract Pay Table schedules, repair invoices, DVIR text, and plain-English narrative drafting.

What happens by default

ArtifactDefault pathAI role
Standard settlement PDFLayout fingerprint → deterministic parser → checksum validationFallback only when the layout is unknown or validation fails
Charge and Credit detailDeterministic parser with source referencesFallback only
Payroll CSV / fuel CSVDeterministic CSV parserNone
Contract Pay Table / rate scheduleDeterministic table extraction when the schedule layout is known; structured extraction for variable schedulesDraft facts only until an operator or bookkeeper confirms the rates
Repair invoiceStructured extraction with OCR/AIAppropriate because shop invoice layouts vary widely
DVIR note or photoDeterministic when structured; OCR/AI for handwritten or photo evidenceDefect text interpretation only

Where AI is used

CapabilityWhat AI doesHow it's constrained
Unknown settlement layoutBootstraps a structured parse when the layout fingerprint is not recognizedOutput must pass schema and checksum validation before rules can use it
Variable contract/rate scheduleReads Schedule C / Attachment C-1, amendments, and rate-card tables when deterministic extraction cannot prove the fieldsExtracted rates are draft facts. They do not become contract truth until confirmed
Repair invoice OCRReads scanned or digital repair invoices and extracts parts, labor hours, rates, and totalsExtraction runs through schema validation; every extracted dollar amount is recalculated deterministically
DVIR text interpretationInterprets free-text or photo-based defect notesFindings remain tied to vehicle, date, and uploaded evidence
Narrative draftingAI drafts the plain-English summary in the Owner BriefThe LLM chooses words; it does not choose numbers. Dollar values are injected from the deterministic math layer

Where AI is never used

CapabilityWhat runs insteadWhy
Fitment engineDeterministic rule-based matchingF1 = 1.0 on ground-truth corpus. No AI needed — and no AI hallucination risk — when matching VINs to part SKUs
Rules libraryTypeScript deterministic rule executor (R001–R025, FPM-001–FPM-014)Every rule output is reproducible. Same input → same finding. AI cannot guarantee this
Dollar mathTypeScript arithmetic, pinned to extracted values and contractual ratesEvery user-facing dollar figure is traceable to a source document. LLMs hallucinate numbers; TypeScript doesn't
Dispute workflowState machine with explicit transitionsDisputes have contractual deadlines. No probabilistic state tracking

Key point: Structured parsers handle the predictable work. AI handles the messy edges. Deterministic TypeScript owns every dollar.

Confidence and review

When a structured parse validates, findings can proceed through the rules library. When a parser cannot prove the result, the artifact routes through fallback extraction or manual review before it enters the rules pipeline. You can adjust review thresholds per tenant.

Note: SubOps does not train models on customer data. Extraction runs against foundation models via API. Your settlement data is not used to improve anyone else's AI.