Technical Architecture

NSPIRE AI Inspection System

End-to-end architecture and implementation plan for transforming 60–100+ NSPIRE PDF standards into a structured, deterministic, AI-assisted inspection platform. Built for client review, investor due diligence, and technical team alignment.

Overview — How It Fits Together

Simple block flow: sources → ingestion → stores → runtime → output. AI assists; rules decide.

1. SourcesHUD Updates · PDF Repository

2. IngestionExtract → AI Convert → Human Verify

3. Canonical StoresJSON · Citation Index · Chunk Store

4. RuntimeProperty · Session · Inspector

5. Agent ToolsNext Item · Load Standard · Validate · Score

6. LLMGuidance · UI · Explain

7. Rules EngineDeterministic · Pass/Fail

8. OutputReports · Tasks · Evidence

Four Non-Negotiable Principles

Standards must be converted into canonical JSON and human verified.
Deterministic rules, not AI, decide compliance outcomes.
Every JSON rule must trace back to exact PDF source coordinates.
AI assists guidance, explanation, and UI orchestration — never final scoring.

Full Architecture Diagram

Split into two parts for clarity: (1) Standards ingestion, canonical stores, dual index, updates, and guardrails; (2) Runtime data, agent tools, LLM layer, deterministic core, field inspection loop, reporting, and remediation.

Architecture — Part 1: Ingestion, Stores & Updates

Architecture — Part 2: Runtime, Inspection Loop & Remediation

1. Source of Truth — Standards Ingestion

Convert raw NSPIRE PDFs into canonical, versioned, human-approved JSON. Every node traces to exact PDF coordinates.

Pipeline

PDF extraction — Text, tables, page numbers, bounding boxes (bbox)
AI conversion — Draft JSON: deficiencies, severity rules, correction timelines, inspection processes, JSON pointers
Source linking — Attach sourceRef to every critical node: jsonPointer → PDF file + page + bbox + anchor text
Human verification UI — Left: PDF page with highlight. Right: JSON node. Approve, edit, reject. Sign + timestamp
Automated regression tests — Scenario-based tests (e.g., facts X → D1 severe 24h). Run on every update

Outputs

Canonical Standards JSON Store
Citation Index
Chunk Store for retrieval
Regression test bank

2. Dual Indexing Strategy

Two separate indexes serve distinct roles: audit traceability vs. contextual retrieval.

Index	Purpose	Use
A — Citation Index	Human trust layer: show exact PDF origin	“View in PDF”, audit trail, legal defensibility
B — Semantic Retrieval (Mini RAG)	Retrieve relevant rule text for explanation; never for scoring	Clarifications, contextual help, training, dynamic UI phrasing

Chunk metadata: standardId, chunkType, locationContext, jsonPointer, sourceRef.

3. Runtime Data Model

Clean separation between truth (canonical JSON) and observation (session, evidence, outcomes).

Property Inventory — Property, building, units, room/door/fixture counts, expected coverage
Inspection Session — sessionId, propertyId, assigned units, inspectorId, progress, coverage tracking
Evidence Store — Photos, audio, notes, metadata (timestamp, item, hash, quality)
Tasks — Deficiency-driven remediation tasks, due dates, assignees, status, escalation
Report Store — Final PDFs, machine exports, linked citations, evidence references

4. Agent Tool Contracts

The AI operates only via deterministic tools. The LLM never bypasses these.

getNextInspectableItem(sessionId)getStandardJson(standardId, version)openPdfAtSource(sourceRef)validateFacts(facts)scoreFacts(facts)commitVerifiedFacts(sessionId, verifiedFacts)saveOutcome(sessionId, outcome)createRemediationTasks(outcome)generateReport(sessionId)

5. LLM Orchestration Layer

The LLM has limited authority. It sees only a live context snapshot: current session, current item, verified facts, property expectations, standardId/version, and a few retrieved chunks.

Allowed

Generate next question
Compose dynamic UI schema
Explain standards
Propose candidate facts
Draft inspection notes

Not allowed

Decide severity
Override deterministic engine
Modify canonical standards
Persist unvalidated facts

6. Deterministic Core

The compliance engine. Output is immutable and authoritative.

Schema Gate — Type validation, required fields, enum checks
Evidence Gate — Minimum photos, quality validation, required confirmations
Consistency Gate — Contradictions, missing mandatory inputs
Rules Engine — Canonical JSON DSL maps facts → deficiencyId, location, severity, correction deadline, pass/fail, rationale

7. Dual Agent Validation

After scoring, two agents plus human-in-the-loop ensure quality and completeness.

Agent A — Compliance Guide — Completeness, required evidence, coverage checks
Agent B — Independent Validator — Different prompt and checks; finds missing items, risks, citation coverage
Live PDF Validation Panel — Show PDF highlight, JSON node, outcome rationale side by side
Human Review — Inspector confirms critical facts; supervisor QA for high-risk items

8. Field Inspection Loop

End-to-end flow from session start to report.

Start session → load property inventory and assignments
getNextInspectableItem → current standard + location scope
getStandardJson + Mini RAG retriever → load canonical standard and context
LLM generates questions, guidance, dynamic UI
Capture inputs (toggles, counts, voice, photos) → Evidence DB + AI candidates
validateFacts → if OK: scoreFacts → outcome; else follow-up
commitVerifiedFacts, saveOutcome, createRemediationTasks
More items? Loop to getNextInspectableItem. Done? Compile session → report

9. Reporting & Remediation

Report assembly matches legacy formats; remediation drives tasks and follow-up.

Report templates — Print-ready layouts, match existing paper format
Report assembler — Merge outcomes, attach evidence, include citations
Outputs — Printable PDF, machine exports (JSON, CSV, API payloads)
Remediation — Notify stakeholders, reminders, repair evidence upload, follow-up inspection, closeout

10. Continuous Updates & Guardrails

Updates

New/updated PDF → diff engine → selective reconversion → targeted human re-approval → re-chunk, re-embed → re-run regression tests → deploy new version. Telemetry on confusion points and missed items feeds prompt and UI improvements — not rule changes.

Guardrails

AI may: propose facts, explain rules, generate UI prompts, draft notes
AI may not: invent standards, override outcomes, change severity/deadlines, save unverified facts
Truth comes from: canonical JSON, verified facts + evidence, deterministic rules engine

Current Project Alignment

This site and its backend represent an early-phase implementation aligned with the plan.

Planned	Current state
Canonical JSON from PDFs	63 standards in MongoDB; `detailed_json` for selected standards (OpenAI Responses API extraction)
Human verification UI	On-demand JSON generation, modal review; full source-link UI planned
Dual indexes	Citation index and vector store not yet implemented; structured JSON ready for embedding
Inspection flow	Outside → Inside → Unit order documented; ecosystem flow, inspection order guide, and standard detail pages live
Deterministic rules engine	Planned; `detailed_json` structure supports rules DSL
Agent tools & LLM	Planned; JSON generation worker and job queue in place for backend orchestration

Next steps: vector store ingestion, agent tool contracts, rules engine, and field-ready mobile/tablet UI.

For detailed flows, see HUD–NSPIRE Ecosystem and Inspection Order Guide