1. Source of Truth — Standards Ingestion
Convert raw NSPIRE PDFs into canonical, versioned, human-approved JSON. Every node traces to exact PDF coordinates.
Pipeline
- PDF extraction — Text, tables, page numbers, bounding boxes (bbox)
- AI conversion — Draft JSON: deficiencies, severity rules, correction timelines, inspection processes, JSON pointers
- Source linking — Attach
sourceRef to every critical node: jsonPointer → PDF file + page + bbox + anchor text - Human verification UI — Left: PDF page with highlight. Right: JSON node. Approve, edit, reject. Sign + timestamp
- Automated regression tests — Scenario-based tests (e.g., facts X → D1 severe 24h). Run on every update
Outputs
- Canonical Standards JSON Store
- Citation Index
- Chunk Store for retrieval
- Regression test bank
2. Dual Indexing Strategy
Two separate indexes serve distinct roles: audit traceability vs. contextual retrieval.
| Index | Purpose | Use |
|---|
| A — Citation Index | Human trust layer: show exact PDF origin | “View in PDF”, audit trail, legal defensibility |
| B — Semantic Retrieval (Mini RAG) | Retrieve relevant rule text for explanation; never for scoring | Clarifications, contextual help, training, dynamic UI phrasing |
Chunk metadata: standardId, chunkType, locationContext, jsonPointer, sourceRef.
3. Runtime Data Model
Clean separation between truth (canonical JSON) and observation (session, evidence, outcomes).
- Property Inventory — Property, building, units, room/door/fixture counts, expected coverage
- Inspection Session — sessionId, propertyId, assigned units, inspectorId, progress, coverage tracking
- Evidence Store — Photos, audio, notes, metadata (timestamp, item, hash, quality)
- Tasks — Deficiency-driven remediation tasks, due dates, assignees, status, escalation
- Report Store — Final PDFs, machine exports, linked citations, evidence references
4. Agent Tool Contracts
The AI operates only via deterministic tools. The LLM never bypasses these.
getNextInspectableItem(sessionId)getStandardJson(standardId, version)openPdfAtSource(sourceRef)validateFacts(facts)scoreFacts(facts)commitVerifiedFacts(sessionId, verifiedFacts)saveOutcome(sessionId, outcome)createRemediationTasks(outcome)generateReport(sessionId)
5. LLM Orchestration Layer
The LLM has limited authority. It sees only a live context snapshot: current session, current item, verified facts, property expectations, standardId/version, and a few retrieved chunks.
Allowed
- Generate next question
- Compose dynamic UI schema
- Explain standards
- Propose candidate facts
- Draft inspection notes
Not allowed
- Decide severity
- Override deterministic engine
- Modify canonical standards
- Persist unvalidated facts
6. Deterministic Core
The compliance engine. Output is immutable and authoritative.
- Schema Gate — Type validation, required fields, enum checks
- Evidence Gate — Minimum photos, quality validation, required confirmations
- Consistency Gate — Contradictions, missing mandatory inputs
- Rules Engine — Canonical JSON DSL maps facts → deficiencyId, location, severity, correction deadline, pass/fail, rationale
7. Dual Agent Validation
After scoring, two agents plus human-in-the-loop ensure quality and completeness.
- Agent A — Compliance Guide — Completeness, required evidence, coverage checks
- Agent B — Independent Validator — Different prompt and checks; finds missing items, risks, citation coverage
- Live PDF Validation Panel — Show PDF highlight, JSON node, outcome rationale side by side
- Human Review — Inspector confirms critical facts; supervisor QA for high-risk items
8. Field Inspection Loop
End-to-end flow from session start to report.
- Start session → load property inventory and assignments
getNextInspectableItem → current standard + location scopegetStandardJson + Mini RAG retriever → load canonical standard and context- LLM generates questions, guidance, dynamic UI
- Capture inputs (toggles, counts, voice, photos) → Evidence DB + AI candidates
validateFacts → if OK: scoreFacts → outcome; else follow-upcommitVerifiedFacts, saveOutcome, createRemediationTasks- More items? Loop to getNextInspectableItem. Done? Compile session → report
9. Reporting & Remediation
Report assembly matches legacy formats; remediation drives tasks and follow-up.
- Report templates — Print-ready layouts, match existing paper format
- Report assembler — Merge outcomes, attach evidence, include citations
- Outputs — Printable PDF, machine exports (JSON, CSV, API payloads)
- Remediation — Notify stakeholders, reminders, repair evidence upload, follow-up inspection, closeout
10. Continuous Updates & Guardrails
Updates
New/updated PDF → diff engine → selective reconversion → targeted human re-approval → re-chunk, re-embed → re-run regression tests → deploy new version. Telemetry on confusion points and missed items feeds prompt and UI improvements — not rule changes.
Guardrails
- AI may: propose facts, explain rules, generate UI prompts, draft notes
- AI may not: invent standards, override outcomes, change severity/deadlines, save unverified facts
- Truth comes from: canonical JSON, verified facts + evidence, deterministic rules engine
Current Project Alignment
This site and its backend represent an early-phase implementation aligned with the plan.
| Planned | Current state |
|---|
| Canonical JSON from PDFs | 63 standards in MongoDB; detailed_json for selected standards (OpenAI Responses API extraction) |
| Human verification UI | On-demand JSON generation, modal review; full source-link UI planned |
| Dual indexes | Citation index and vector store not yet implemented; structured JSON ready for embedding |
| Inspection flow | Outside → Inside → Unit order documented; ecosystem flow, inspection order guide, and standard detail pages live |
| Deterministic rules engine | Planned; detailed_json structure supports rules DSL |
| Agent tools & LLM | Planned; JSON generation worker and job queue in place for backend orchestration |
Next steps: vector store ingestion, agent tool contracts, rules engine, and field-ready mobile/tablet UI.