The foundational evidentiary rule governing every aspect of the A.R.E. pipeline:
"If the public record does not show it, the dataset does not pretend it exists."
This rule applies to every field in every record. There are no exceptions for convenience, completeness, or presentation quality.
The following rules govern all data extraction and parsing operations. These rules are enforced at the pipeline level and cannot be overridden by individual run configurations.
Dates must be extracted verbatim from the source field. No date inference, estimation, or derivation from adjacent fields. If a date field is empty or absent in the source, the output field must be null — not a default value, not a placeholder.
Status values must be extracted from the portal's own classification vocabulary. No normalization that changes meaning. If a status value is ambiguous, it must be recorded as extracted with an ambiguity flag — not resolved by assumption.
A Notice of Violation or Violation Issued record must be confirmed by presence in the Related Records table or a linked NOV record. Absence of a NOV record must be recorded as "not found" — not as "no violation issued," which would be an inference.
Every record_number must appear exactly once in the output. Duplicate detection runs before packaging. Any duplicate triggers an immediate QA failure and halts the run. No exceptions.
Every extracted field must carry a provenance tag indicating its source location. Fields without a verifiable source must be tagged as "not_found" and left null in the output. Provenance tags are non-optional and cannot be omitted.
Each pipeline run must use a fresh output folder. No partial outputs from previous runs may be reused or merged. No manual patching of output files between runs. The pipeline output must be reproducible from the input CSV alone.
Seattle Tenants & Landlord Code Accountability. Created and operates the A.R.E. platform as an independent public accountability initiative.
Accountability Record Engine. The city-agnostic infrastructure platform. Designed for multi-city deployment without rebuilding core systems. Powered by 8 specialized modules.
The Seattle-specific deployment of the A.R.E. platform. First operational instance, targeting all Seattle housing enforcement districts.
The A.R.E. QA framework runs at multiple stages of the pipeline. QA failures are hard stops — the run terminates and no output is packaged.
The pipeline foundation is operational and validated. Confirmed working components include all core extraction, QA, and packaging systems.