Technical Architecture

NSPIRE AI Inspection System

End-to-end architecture and implementation plan for transforming 60–100+ NSPIRE PDF standards into a structured, deterministic, AI-assisted inspection platform. Built for client review, investor due diligence, and technical team alignment.

Overview — How It Fits Together

Simple block flow: sources → ingestion → stores → runtime → output. AI assists; rules decide.

1. SourcesHUD Updates · PDF Repository
2. IngestionExtract → AI Convert → Human Verify
3. Canonical StoresJSON · Citation Index · Chunk Store
4. RuntimeProperty · Session · Inspector
5. Agent ToolsNext Item · Load Standard · Validate · Score
6. LLMGuidance · UI · Explain
7. Rules EngineDeterministic · Pass/Fail
8. OutputReports · Tasks · Evidence

Four Non-Negotiable Principles

  1. Standards must be converted into canonical JSON and human verified.
  2. Deterministic rules, not AI, decide compliance outcomes.
  3. Every JSON rule must trace back to exact PDF source coordinates.
  4. AI assists guidance, explanation, and UI orchestration — never final scoring.

Full Architecture Diagram

Split into two parts for clarity: (1) Standards ingestion, canonical stores, dual index, updates, and guardrails; (2) Runtime data, agent tools, LLM layer, deterministic core, field inspection loop, reporting, and remediation.

Architecture — Part 1: Ingestion, Stores & Updates

Loading...

Architecture — Part 2: Runtime, Inspection Loop & Remediation

Loading...

1. Source of Truth — Standards Ingestion

Convert raw NSPIRE PDFs into canonical, versioned, human-approved JSON. Every node traces to exact PDF coordinates.

Pipeline

  1. PDF extraction — Text, tables, page numbers, bounding boxes (bbox)
  2. AI conversion — Draft JSON: deficiencies, severity rules, correction timelines, inspection processes, JSON pointers
  3. Source linking — Attach sourceRef to every critical node: jsonPointer → PDF file + page + bbox + anchor text
  4. Human verification UI — Left: PDF page with highlight. Right: JSON node. Approve, edit, reject. Sign + timestamp
  5. Automated regression tests — Scenario-based tests (e.g., facts X → D1 severe 24h). Run on every update

Outputs

  • Canonical Standards JSON Store
  • Citation Index
  • Chunk Store for retrieval
  • Regression test bank

2. Dual Indexing Strategy

Two separate indexes serve distinct roles: audit traceability vs. contextual retrieval.

IndexPurposeUse
A — Citation IndexHuman trust layer: show exact PDF origin“View in PDF”, audit trail, legal defensibility
B — Semantic Retrieval (Mini RAG)Retrieve relevant rule text for explanation; never for scoringClarifications, contextual help, training, dynamic UI phrasing

Chunk metadata: standardId, chunkType, locationContext, jsonPointer, sourceRef.

3. Runtime Data Model

Clean separation between truth (canonical JSON) and observation (session, evidence, outcomes).

  • Property Inventory — Property, building, units, room/door/fixture counts, expected coverage
  • Inspection Session — sessionId, propertyId, assigned units, inspectorId, progress, coverage tracking
  • Evidence Store — Photos, audio, notes, metadata (timestamp, item, hash, quality)
  • Tasks — Deficiency-driven remediation tasks, due dates, assignees, status, escalation
  • Report Store — Final PDFs, machine exports, linked citations, evidence references

4. Agent Tool Contracts

The AI operates only via deterministic tools. The LLM never bypasses these.

getNextInspectableItem(sessionId)getStandardJson(standardId, version)openPdfAtSource(sourceRef)validateFacts(facts)scoreFacts(facts)commitVerifiedFacts(sessionId, verifiedFacts)saveOutcome(sessionId, outcome)createRemediationTasks(outcome)generateReport(sessionId)

5. LLM Orchestration Layer

The LLM has limited authority. It sees only a live context snapshot: current session, current item, verified facts, property expectations, standardId/version, and a few retrieved chunks.

Allowed

  • Generate next question
  • Compose dynamic UI schema
  • Explain standards
  • Propose candidate facts
  • Draft inspection notes

Not allowed

  • Decide severity
  • Override deterministic engine
  • Modify canonical standards
  • Persist unvalidated facts

6. Deterministic Core

The compliance engine. Output is immutable and authoritative.

  1. Schema Gate — Type validation, required fields, enum checks
  2. Evidence Gate — Minimum photos, quality validation, required confirmations
  3. Consistency Gate — Contradictions, missing mandatory inputs
  4. Rules Engine — Canonical JSON DSL maps facts → deficiencyId, location, severity, correction deadline, pass/fail, rationale

7. Dual Agent Validation

After scoring, two agents plus human-in-the-loop ensure quality and completeness.

  • Agent A — Compliance Guide — Completeness, required evidence, coverage checks
  • Agent B — Independent Validator — Different prompt and checks; finds missing items, risks, citation coverage
  • Live PDF Validation Panel — Show PDF highlight, JSON node, outcome rationale side by side
  • Human Review — Inspector confirms critical facts; supervisor QA for high-risk items

8. Field Inspection Loop

End-to-end flow from session start to report.

  1. Start session → load property inventory and assignments
  2. getNextInspectableItem → current standard + location scope
  3. getStandardJson + Mini RAG retriever → load canonical standard and context
  4. LLM generates questions, guidance, dynamic UI
  5. Capture inputs (toggles, counts, voice, photos) → Evidence DB + AI candidates
  6. validateFacts → if OK: scoreFacts → outcome; else follow-up
  7. commitVerifiedFacts, saveOutcome, createRemediationTasks
  8. More items? Loop to getNextInspectableItem. Done? Compile session → report

9. Reporting & Remediation

Report assembly matches legacy formats; remediation drives tasks and follow-up.

  • Report templates — Print-ready layouts, match existing paper format
  • Report assembler — Merge outcomes, attach evidence, include citations
  • Outputs — Printable PDF, machine exports (JSON, CSV, API payloads)
  • Remediation — Notify stakeholders, reminders, repair evidence upload, follow-up inspection, closeout

10. Continuous Updates & Guardrails

Updates

New/updated PDF → diff engine → selective reconversion → targeted human re-approval → re-chunk, re-embed → re-run regression tests → deploy new version. Telemetry on confusion points and missed items feeds prompt and UI improvements — not rule changes.

Guardrails

  • AI may: propose facts, explain rules, generate UI prompts, draft notes
  • AI may not: invent standards, override outcomes, change severity/deadlines, save unverified facts
  • Truth comes from: canonical JSON, verified facts + evidence, deterministic rules engine

Current Project Alignment

This site and its backend represent an early-phase implementation aligned with the plan.

PlannedCurrent state
Canonical JSON from PDFs63 standards in MongoDB; detailed_json for selected standards (OpenAI Responses API extraction)
Human verification UIOn-demand JSON generation, modal review; full source-link UI planned
Dual indexesCitation index and vector store not yet implemented; structured JSON ready for embedding
Inspection flowOutside → Inside → Unit order documented; ecosystem flow, inspection order guide, and standard detail pages live
Deterministic rules enginePlanned; detailed_json structure supports rules DSL
Agent tools & LLMPlanned; JSON generation worker and job queue in place for backend orchestration

Next steps: vector store ingestion, agent tool contracts, rules engine, and field-ready mobile/tablet UI.

For detailed flows, see HUD–NSPIRE Ecosystem and Inspection Order Guide