Technical & Evidentiary Standards

Detailed documentation of the A.R.E. extraction methodology, QA standards, provenance rules, and strict parsing requirements.

The Truth Engine Standard

The foundational evidentiary rule governing every aspect of the A.R.E. pipeline:

"If the public record does not show it, the dataset does not pretend it exists."

This rule applies to every field in every record. There are no exceptions for convenience, completeness, or presentation quality.

No hallucinated dates
No guessed outcomes
No invented inspection activity
No fabricated enforcement conclusions
SOURCE Portal record page
SOURCE Timeline / status tab
SOURCE Related records table
SOURCE Notice of Violation records
SOURCE Attachments / PDFs / Findings text
ABSENT Not found in public record

Strict Parsing Rules

The following rules govern all data extraction and parsing operations. These rules are enforced at the pipeline level and cannot be overridden by individual run configurations.

RULE 01

Date Parsing

Dates must be extracted verbatim from the source field. No date inference, estimation, or derivation from adjacent fields. If a date field is empty or absent in the source, the output field must be null — not a default value, not a placeholder.

RULE 02

Status Classification

Status values must be extracted from the portal's own classification vocabulary. No normalization that changes meaning. If a status value is ambiguous, it must be recorded as extracted with an ambiguity flag — not resolved by assumption.

RULE 03

NOV / Violation Issued Detection

A Notice of Violation or Violation Issued record must be confirmed by presence in the Related Records table or a linked NOV record. Absence of a NOV record must be recorded as "not found" — not as "no violation issued," which would be an inference.

RULE 04

Duplicate Detection

Every record_number must appear exactly once in the output. Duplicate detection runs before packaging. Any duplicate triggers an immediate QA failure and halts the run. No exceptions.

RULE 05

Field Provenance Tagging

Every extracted field must carry a provenance tag indicating its source location. Fields without a verifiable source must be tagged as "not_found" and left null in the output. Provenance tags are non-optional and cannot be omitted.

RULE 06

Run Isolation

Each pipeline run must use a fresh output folder. No partial outputs from previous runs may be reused or merged. No manual patching of output files between runs. The pipeline output must be reproducible from the input CSV alone.

Organizational structure

ORGANIZATION

STLCA

Seattle Tenants & Landlord Code Accountability. Created and operates the A.R.E. platform as an independent public accountability initiative.

PLATFORM

A.R.E.

Accountability Record Engine. The city-agnostic infrastructure platform. Designed for multi-city deployment without rebuilding core systems. Powered by 8 specialized modules.

DEPLOYMENT

Seattle A.R.E.

The Seattle-specific deployment of the A.R.E. platform. First operational instance, targeting all Seattle housing enforcement districts.

Quality assurance enforcement

The A.R.E. QA framework runs at multiple stages of the pipeline. QA failures are hard stops — the run terminates and no output is packaged.

# QA Enforcement Stages
# ─────────────────────────────────────────────

Stage 1: Schema Validation
  → All required fields present
  → Field types match schema definition
  → Provenance tags present on all extracted fields

Stage 2: Uniqueness Enforcement
  → record_number appears exactly once
  → No duplicate JSONL rows
  → HARD STOP on any duplicate detected

Stage 3: Extraction Integrity
  → Zero JSON decode failures
  → Zero extraction errors
  → HARD STOP on any decode failure

Stage 4: Manifest Validation
  → Record count matches input CSV
  → All districts represented
  → HARD STOP on count mismatch

# If all stages pass → package and seal output
# If any stage fails → terminate, do not package

Pipeline operational status

The pipeline foundation is operational and validated. Confirmed working components include all core extraction, QA, and packaging systems.

Confirmed Working
  • District batch CSV processing (The Driver)
  • JSONL generation (The Harvester)
  • Schema validation (The Gatekeeper)
  • Pattern recognition (The Signal)
  • Accountability reporting (The Ledger)
  • QA framework & manifest verification (The Auditor)
  • Hard-stop enforcement (The Dispatch)
  • NOV/enforcement calculation (The Reckoning)
Validated Results
0
Duplicates in validated runs
0
Extraction errors
0
JSON decode failures