Lowry automated processing for the colorado state address dataset

Colorado State Address Dataset
Automated Processing
Nathan Lowry, GIS Outreach Coordinator
State of Colorado
September 24, 2014

Common Data Model
● Allows local and state-wide querying, analysis, and integration …
● Accommodates information exchanges
▪ Hierarchical - City to County, County to Region, Region to State
▪ Among neighboring jurisdictions (eg. County to County, etc.)
● Allows profiles to provide data in standard forms for specific
objectives
▪ NENA CLDXF for NG-911
▪ USPS Pub-28 for CASS
▪ ArcGIS Geocoding (for quality comparisons, etc.)
● It’s more efficient (less work) and assures more quality (less loss)

FGDC-STD-016-2011
United States Thoroughfare, Landmark, and Postal Address Data Standard
Of Greatest Significance:
1.Everything* is ‘fully explicit’ (fully spelled‐out)
No abbreviations allowed; No Ambiguity
*The only exception is two‐letter state postal codes (eg. “CO” = Colorado)
●2.You will express exactly how each address will be parsed
Parsing is no longer subject to interpretation
The break‐down is stored in the data for each record
3.Each Address must be assigned a Unique Identifier (UID)
Multiple representations of the same address can be “tied
together” if and only if (iff) addresses are assigned UIDs.
These are big changes that few have yet implemented
•Our common data model is designed to accommodate both:
‒your current state and
‒this “to be” state

Presuppositions:
● SQL Server Integration Services (SSIS)
o Parallel processing - fast translations - True.
o Most Compatible with SQL Server - Irrelevant*
o Developed by DBAs for DBAs - No, developed by app
developers for app developers
▪ (ie. Normalization tools) - Hah, hah, hah, hah,
hah!
o No Additional Cost - (This one bore out)
o I learned French instead of Spanish - (SSIS instead of
Python)
● No Parsing
o I will translate, but it’ll be the locals’ responsibility to
pre-parse... - No parsing, no geocoding*
o In addition, no last lines, no geocoding*
● 6-8 Weeks Processing - 6-8 Months of Processing

Colorado State Address Dataset
Automated and Manual Processes

Observations
● SQL Server Integration Services (SSIS)
○ SSIS is quirky
○ SSIS Expression Language is Swahili
○ A modeling canvas may be more effective for design
○ SSIS can integrate with many other server processes (FTP)
● Parsing and “Last Lining” will give CO jurisdictions a
leg up
○ The level of effort can be significant
○ CLDXF Street Naming and Address Numbering Conventions
● Standards
○ Jurisdictional pretypes, sequencers - minor tweaks
○ Subaddress conventions need ... something

Opportunities
● Standards
○ Improvement via implementation
○ Coalescence on Subaddresses
● Common implementations of data models
○ Reduce the cost of development
○ Makes sharing of code useful and possible
● Common code
○ Shared parsing tools
○ Shared applications

Lowry automated processing for the colorado state address dataset

More Related Content

Lowry automated processing for the colorado state address dataset