Methodology — Seattle A.R.E.

Core Standard

The Truth Engine Standard

The foundational evidentiary rule governing every aspect of the A.R.E. pipeline:

"If the public record does not show it, the dataset does not pretend it exists."

This rule applies to every field in every record. There are no exceptions for convenience, completeness, or presentation quality.

✗ No hallucinated dates

✗ No guessed outcomes

✗ No invented inspection activity

✗ No fabricated enforcement conclusions

Provenance Categories

SOURCE Portal record page

SOURCE Timeline / status tab

SOURCE Related records table

SOURCE Notice of Violation records

SOURCE Attachments / PDFs / Findings text

ABSENT Not found in public record

STRICT_PARSING_RULES.md

Strict Parsing Rules

The following rules govern all data extraction and parsing operations. These rules are enforced at the pipeline level and cannot be overridden by individual run configurations.

RULE 01

Date Parsing

Dates must be extracted verbatim from the source field. No date inference, estimation, or derivation from adjacent fields. If a date field is empty or absent in the source, the output field must be null — not a default value, not a placeholder.

RULE 02

Status Classification

Status values must be extracted from the portal's own classification vocabulary. No normalization that changes meaning. If a status value is ambiguous, it must be recorded as extracted with an ambiguity flag — not resolved by assumption.

RULE 03

NOV / Violation Issued Detection

A Notice of Violation or Violation Issued record must be confirmed by presence in the Related Records table or a linked NOV record. Absence of a NOV record must be recorded as "not found" — not as "no violation issued," which would be an inference.

RULE 04

Duplicate Detection

Every record_number must appear exactly once in the output. Duplicate detection runs before packaging. Any duplicate triggers an immediate QA failure and halts the run. No exceptions.

RULE 05

Field Provenance Tagging

Every extracted field must carry a provenance tag indicating its source location. Fields without a verifiable source must be tagged as "not_found" and left null in the output. Provenance tags are non-optional and cannot be omitted.

RULE 06

Run Isolation

Each pipeline run must use a fresh output folder. No partial outputs from previous runs may be reused or merged. No manual patching of output files between runs. The pipeline output must be reproducible from the input CSV alone.

System Architecture

Organizational structure

ORGANIZATION

STLCA

Seattle Tenants & Landlord Code Accountability. Created and operates the A.R.E. platform as an independent public accountability initiative.

PLATFORM

A.R.E.

Accountability Record Engine. The city-agnostic infrastructure platform. Designed for multi-city deployment without rebuilding core systems. Powered by 8 specialized modules.

DEPLOYMENT

Seattle A.R.E.

The Seattle-specific deployment of the A.R.E. platform. First operational instance, targeting all Seattle housing enforcement districts.

QA Framework

Quality assurance enforcement

The A.R.E. QA framework runs at multiple stages of the pipeline. QA failures are hard stops — the run terminates and no output is packaged.

        # QA Enforcement Stages

        # ─────────────────────────────────────────────

        Stage 1: Schema Validation

          → All required fields present

          → Field types match schema definition

          → Provenance tags present on all extracted fields

        Stage 2: Uniqueness Enforcement

          → record_number appears exactly once

          → No duplicate JSONL rows

          → HARD STOP on any duplicate detected

        Stage 3: Extraction Integrity

          → Zero JSON decode failures

          → Zero extraction errors

          → HARD STOP on any decode failure

        Stage 4: Manifest Validation

          → Record count matches input CSV

          → All districts represented

          → HARD STOP on count mismatch

        # If all stages pass → package and seal output

        # If any stage fails → terminate, do not package

Current Status

Pipeline operational status

The pipeline foundation is operational and validated. Confirmed working components include all core extraction, QA, and packaging systems.

Confirmed Working

District batch CSV processing (The Driver)
JSONL generation (The Harvester)
Schema validation (The Gatekeeper)
Pattern recognition (The Signal)
Accountability reporting (The Ledger)
QA framework & manifest verification (The Auditor)
Hard-stop enforcement (The Dispatch)
NOV/enforcement calculation (The Reckoning)

Validated Results

0

Duplicates in validated runs

0

Extraction errors

0

JSON decode failures

Technical & Evidentiary Standards