SlideShare a Scribd company logo
Colorado State Address Dataset 
Automated Processing 
Nathan Lowry, GIS Outreach Coordinator 
State of Colorado 
September 24, 2014
Lowry automated processing for the colorado state address dataset
Common Data Model 
● Allows local and state-wide querying, analysis, and integration … 
● Accommodates information exchanges 
▪ Hierarchical - City to County, County to Region, Region to State 
▪ Among neighboring jurisdictions (eg. County to County, etc.) 
● Allows profiles to provide data in standard forms for specific 
objectives 
▪ NENA CLDXF for NG-911 
▪ USPS Pub-28 for CASS 
▪ ArcGIS Geocoding (for quality comparisons, etc.) 
● It’s more efficient (less work) and assures more quality (less loss)
FGDC-STD-016-2011 
United States Thoroughfare, Landmark, and Postal Address Data Standard 
Of Greatest Significance: 
1.Everything* is ‘fully explicit’ (fully spelled‐out) 
No abbreviations allowed; No Ambiguity 
*The only exception is two‐letter state postal codes (eg. “CO” = Colorado) 
●2.You will express exactly how each address will be parsed 
Parsing is no longer subject to interpretation 
The break‐down is stored in the data for each record 
3.Each Address must be assigned a Unique Identifier (UID) 
Multiple representations of the same address can be “tied 
together” if and only if (iff) addresses are assigned UIDs. 
These are big changes that few have yet implemented 
•Our common data model is designed to accommodate both: 
‒your current state and 
‒this “to be” state
Presuppositions: 
● SQL Server Integration Services (SSIS) 
o Parallel processing - fast translations - True. 
o Most Compatible with SQL Server - Irrelevant* 
o Developed by DBAs for DBAs - No, developed by app 
developers for app developers 
▪ (ie. Normalization tools) - Hah, hah, hah, hah, 
hah! 
o No Additional Cost - (This one bore out) 
o I learned French instead of Spanish - (SSIS instead of 
Python) 
● No Parsing 
o I will translate, but it’ll be the locals’ responsibility to 
pre-parse... - No parsing, no geocoding* 
o In addition, no last lines, no geocoding* 
● 6-8 Weeks Processing - 6-8 Months of Processing
Automating Processes
Colorado State Address Dataset 
Automated and Manual Processes
Automating Processes
Observations 
● SQL Server Integration Services (SSIS) 
○ SSIS is quirky 
○ SSIS Expression Language is Swahili 
○ A modeling canvas may be more effective for design 
○ SSIS can integrate with many other server processes (FTP) 
● Parsing and “Last Lining” will give CO jurisdictions a 
leg up 
○ The level of effort can be significant 
○ CLDXF Street Naming and Address Numbering Conventions 
● Standards 
○ Jurisdictional pretypes, sequencers - minor tweaks 
○ Subaddress conventions need ... something
Opportunities 
● Standards 
○ Improvement via implementation 
○ Coalescence on Subaddresses 
● Common implementations of data models 
○ Reduce the cost of development 
○ Makes sharing of code useful and possible 
● Common code 
○ Shared parsing tools 
○ Shared applications
Questions? 
Thank You!

More Related Content

Lowry automated processing for the colorado state address dataset

  • 1. Colorado State Address Dataset Automated Processing Nathan Lowry, GIS Outreach Coordinator State of Colorado September 24, 2014
  • 3. Common Data Model ● Allows local and state-wide querying, analysis, and integration … ● Accommodates information exchanges ▪ Hierarchical - City to County, County to Region, Region to State ▪ Among neighboring jurisdictions (eg. County to County, etc.) ● Allows profiles to provide data in standard forms for specific objectives ▪ NENA CLDXF for NG-911 ▪ USPS Pub-28 for CASS ▪ ArcGIS Geocoding (for quality comparisons, etc.) ● It’s more efficient (less work) and assures more quality (less loss)
  • 4. FGDC-STD-016-2011 United States Thoroughfare, Landmark, and Postal Address Data Standard Of Greatest Significance: 1.Everything* is ‘fully explicit’ (fully spelled‐out) No abbreviations allowed; No Ambiguity *The only exception is two‐letter state postal codes (eg. “CO” = Colorado) ●2.You will express exactly how each address will be parsed Parsing is no longer subject to interpretation The break‐down is stored in the data for each record 3.Each Address must be assigned a Unique Identifier (UID) Multiple representations of the same address can be “tied together” if and only if (iff) addresses are assigned UIDs. These are big changes that few have yet implemented •Our common data model is designed to accommodate both: ‒your current state and ‒this “to be” state
  • 5. Presuppositions: ● SQL Server Integration Services (SSIS) o Parallel processing - fast translations - True. o Most Compatible with SQL Server - Irrelevant* o Developed by DBAs for DBAs - No, developed by app developers for app developers ▪ (ie. Normalization tools) - Hah, hah, hah, hah, hah! o No Additional Cost - (This one bore out) o I learned French instead of Spanish - (SSIS instead of Python) ● No Parsing o I will translate, but it’ll be the locals’ responsibility to pre-parse... - No parsing, no geocoding* o In addition, no last lines, no geocoding* ● 6-8 Weeks Processing - 6-8 Months of Processing
  • 7. Colorado State Address Dataset Automated and Manual Processes
  • 9. Observations ● SQL Server Integration Services (SSIS) ○ SSIS is quirky ○ SSIS Expression Language is Swahili ○ A modeling canvas may be more effective for design ○ SSIS can integrate with many other server processes (FTP) ● Parsing and “Last Lining” will give CO jurisdictions a leg up ○ The level of effort can be significant ○ CLDXF Street Naming and Address Numbering Conventions ● Standards ○ Jurisdictional pretypes, sequencers - minor tweaks ○ Subaddress conventions need ... something
  • 10. Opportunities ● Standards ○ Improvement via implementation ○ Coalescence on Subaddresses ● Common implementations of data models ○ Reduce the cost of development ○ Makes sharing of code useful and possible ● Common code ○ Shared parsing tools ○ Shared applications