SlideShare a Scribd company logo
Geoservices activities
      at EDINA
(OR Why the Elephant is
     your Friend)
About - EDINA National Data Centre

    • A designated National Data Centre for Tertiary
      Education since 1995
    • Based at The University of Edinburgh
    • Our mission...
             to enhance the productivity of research, learning and
                  teaching in UK higher and further education
                                        BY

        delivering access to a range of online data services through a
         UK academic infrastructure, as well as supporting knowledge
               exchange and ICT capacity building, nationally and
                                 internationally.
    •    Focus is on service but also undertake r&D
    •    History
          – first online GI service, UKBORDERS, launched in 1994
          – flagship Digimap service now a teenager!
          – substantial experience in handling geospatial data on a large
            scale (large db; large user base)
The Geoservices Team


• Largest team within EDINA                                1999
• Highly experienced and skilled
  team                                          Projects
     – provides advice nationally and
       internationally
                                               Services
     – active in standards development and
       policy
     – active in GI community nationally and
                                                           Today
       internationally

                                               Projects
•   Demands of the services offered
    means the team has been at
    leading edge of GI service
                                               Services
    development in UK
Our Service requirements



      •   Fast servicing of requests
      •   Scaleable and extensible
          – accommodates steady or increasing demand
      •   Robust (our SLA aspires to 99% uptime!)
      •   Maintainable
      •   Standardised
          – can easily substitute components for repair, upgrade,
            etc.
      •   Rapid prototyping and rollout
      •   All of above on tight budget!
What do we use Postgres/PostGIS for?


 • Service operation and management

 • Map creation
    – Data store for vector based maps
    – Indexing service for raster based maps
    – Source for ‘Get Feature Info’ queries


 • Data Delivery
    – Data store for vector products


 • Searching/Querying
    – Advanced place name searching
… for service operation and management


 • Store service critical metadata

 • User data

 • Control user access

 • Log activity
Case Study: Digimap


 • Approx 50,000 active users at any point in time

 • Academic Year 2010/11 stats

 • c400,000 logins

 • Over 10 million maps created

 • 240,000 high quality print maps generated

 • 100,000 data download requests

 • Over 1 million data files downloaded
… as a ‘Data Store’ for mapping


 • From the (very) large
 • Ordnance Survey’s MasterMap (in EDINA’s map schema)

  Data Rows:

  Area:       107,293,931
  Lines:      278,110,576
  Boundary:   535,039
  Points:     3,984,140
  Symbols:    2,793,680
  Text:       21,004,729


  Data Size (indexes):
  Area:       49 Gb (13Gb)
  Lines:      73 Gb (24Gb)
  Boundary:   321 Mb (46 Mb)
  Points:     668 Mb (399 Mb)
  Symbols:    522 Mb (236 Mb)
  Text:       4 Gb (1.7gb)
… as a ‘Data Store’ for mapping


 • … via the small but cartographically complex
 • Ordnance Survey’s Strategi

  Only 778,000 rows


  Range of geometries


  Strict layer draw order


  Over 50 layers


  Many drawn multiple times
… as a ‘Data Store’ for mapping


 • … to the complex data schema
 • SeaZone’s Hydrospatial

  Large range of features

  Complex feature relationships

  Individual layers scale control
… as a ‘Spatial Indexing’ system


 •     Spatial index for 1.4 million historical maps of Great Britain
 •     Covers the late 1840s to early 1990s
     Complex file structure

     Reflects original capture
      Counties
      Towns
      Editions
      Scale

     And the digitisation process

     … but not critically TIME
• However, for historical data the temporal availability was
  critical.

• Use of date information in addition to spatial index allows
  maps to be placed in correct time slot
   – Used publication date as survey date metadata missing
   – An example of a MapServer layer definition for 1890s maps:

   area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a,
         (selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as
         varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version =
         'ng' or version = 'cs_ng') and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and
         a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND
         (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group
         where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as
         varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition =
         sheet_group.edition2) as subq using unique id using SRID=27700
• Ease of use with range of map rendering software
                                            OS Strategi (Cadcorp GeognoSIS)




OS Open Data: Panorama and Vector
Map District products plus grid lines and
labels (MapServer)
… for WMS GetFeature Info

• Easy to provide
  information about
  selected feature.
• Allow use of additional
  search parameters, for
  example proximity to
  point clicked.
• Access additional
  metadata tables for                                      Example of proximity
  information.                                             search (especially useful
                                                           for point data)



                                                         Map sheet information
                                                         stored in metadata tables.




                            Bedrock information and
                            selected area highlighted.
… update interfaces to reflect current map



  Legend shows only rock
  types in area (over 1000
  in full legend)




  Timeline highlights selected as well as other available decades
… as a ‘Data Store’ for download



  UKBORDERS provides bespoke
  data extraction of vector
  boundary data in custom
  formats (Shape, MIF,KML,DXF)

  Realtime extraction - uses
  Geoserver over PostGIS as WFS
  piped through FME

  Metamodel built around
  PostGIS (formerly Oracle).
  Migration resulted in a more
  scalable (multiple
  dev/live/fallover instances) with
  easier desktop prototyping

  OpenBoundaries – same
  engine, different data (all
  based around derived OS
  Open Data) and skin
… for querying

 •   Unlock provides an Application Programming Interface (API)
     for querying over 11 million geographic names across variety
     of gazetteers:
     •   GeoNames (world coverage)
     •   Pleiades ancient place names (world coverage)
     •   Natural Earth (world coverage)
     •   OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary-
         Line, BN Grid references


 •   Placename outlines and attribution extracted from mapping
     data or published gazetteers

 •   Outlines are unique service feature enabling further spatial
     data extraction and analysis

 •   Unlock Places extensively uses stored database procedures:
     •   The writing of dynamic queries.
     •   Allowing complex data filtering and parsing.
Outline of Southampton returned by Unlock Places
How do we use Postgres/PostGIS to best effect


 • Ensure data schemas are determined by functionality
    – Do NOT accept defaults from loaders
    – Use INTs for primary selection attributes


 • Tailor data processing to task
    – For mapping do NOT include non-mapped features or attributes


 • Indexes are your friend
    – Ensure all search attributes are indexed


 • Clustered indexes are your best pal
    – Critical for our mapping schemas


 • Bad or unnecessary indexes are your worst enemy
    – Can cause sever slowdown resulting in a bad user experience
    – Make use of EXPLAIN
• Hide internal complexity behind database views – makes
  applications more portable

• Use schemas to roll out data updates (just set search path to
  look in new default schema), makes rolling back to previous
  data version easy.

• Take advantage of stored procedures. If SQL is hidden in
  application code then it might be impossible to roll out changes
  instantly because of the need to re-compile, re-deploy the
  application, downtime might be required  By storing SQL
  within procedures any changes become immediate and more
  seamless.

• Use built in data replication per instance – feel more protected
  from bad luck!
What we like about Postgres/PostGIS


 • Reliable
                                      ..and the elephants ...
 • Performant

 • Scalable

 • Easier replication

 • Standards compliant

 • Comes with good tools

 • Superb 3rd party support
The future: What are we planning?


 • Migrating to Postgres 9.1
    – Currently we have a mix of 8.3 and 8.4 installs
    – Take advantage of new functionality and bug fixes


 • Exploring the new functionality in PostGIS 2.0 to enhance
   existing services and possible new ones
    – Raster capabilities
    – Topology
    – Generalisation with
                                Highly generalised Census
    topological consistency     2001 OAs in Nottingham.
                                all input features are
    constraints                 present post generalisation
                                with no overlaps or new
                                slivers introduced.
Conclusion


 • Postgres and PostGIS has been used to power EDINA geo-
   services for over 8 years

 • During late 2011 the last major service was migrated.

 • All geo-services (and some non-geo ones!) at EDINA rely on
   Postgres/PostGIS as either the sole or principal database

 • It will continue to form the core of our services for the
   foreseeable future.

 • The elephant is our friend, it certainly could be yours!

More Related Content

Geoservices Activities at EDINA

  • 1. Geoservices activities at EDINA (OR Why the Elephant is your Friend)
  • 2. About - EDINA National Data Centre • A designated National Data Centre for Tertiary Education since 1995 • Based at The University of Edinburgh • Our mission... to enhance the productivity of research, learning and teaching in UK higher and further education BY delivering access to a range of online data services through a UK academic infrastructure, as well as supporting knowledge exchange and ICT capacity building, nationally and internationally. • Focus is on service but also undertake r&D • History – first online GI service, UKBORDERS, launched in 1994 – flagship Digimap service now a teenager! – substantial experience in handling geospatial data on a large scale (large db; large user base)
  • 3. The Geoservices Team • Largest team within EDINA 1999 • Highly experienced and skilled team Projects – provides advice nationally and internationally Services – active in standards development and policy – active in GI community nationally and Today internationally Projects • Demands of the services offered means the team has been at leading edge of GI service Services development in UK
  • 4. Our Service requirements • Fast servicing of requests • Scaleable and extensible – accommodates steady or increasing demand • Robust (our SLA aspires to 99% uptime!) • Maintainable • Standardised – can easily substitute components for repair, upgrade, etc. • Rapid prototyping and rollout • All of above on tight budget!
  • 5. What do we use Postgres/PostGIS for? • Service operation and management • Map creation – Data store for vector based maps – Indexing service for raster based maps – Source for ‘Get Feature Info’ queries • Data Delivery – Data store for vector products • Searching/Querying – Advanced place name searching
  • 6. … for service operation and management • Store service critical metadata • User data • Control user access • Log activity
  • 7. Case Study: Digimap • Approx 50,000 active users at any point in time • Academic Year 2010/11 stats • c400,000 logins • Over 10 million maps created • 240,000 high quality print maps generated • 100,000 data download requests • Over 1 million data files downloaded
  • 8. … as a ‘Data Store’ for mapping • From the (very) large • Ordnance Survey’s MasterMap (in EDINA’s map schema) Data Rows: Area: 107,293,931 Lines: 278,110,576 Boundary: 535,039 Points: 3,984,140 Symbols: 2,793,680 Text: 21,004,729 Data Size (indexes): Area: 49 Gb (13Gb) Lines: 73 Gb (24Gb) Boundary: 321 Mb (46 Mb) Points: 668 Mb (399 Mb) Symbols: 522 Mb (236 Mb) Text: 4 Gb (1.7gb)
  • 9. … as a ‘Data Store’ for mapping • … via the small but cartographically complex • Ordnance Survey’s Strategi Only 778,000 rows Range of geometries Strict layer draw order Over 50 layers Many drawn multiple times
  • 10. … as a ‘Data Store’ for mapping • … to the complex data schema • SeaZone’s Hydrospatial Large range of features Complex feature relationships Individual layers scale control
  • 11. … as a ‘Spatial Indexing’ system • Spatial index for 1.4 million historical maps of Great Britain • Covers the late 1840s to early 1990s Complex file structure Reflects original capture Counties Towns Editions Scale And the digitisation process … but not critically TIME
  • 12. • However, for historical data the temporal availability was critical. • Use of date information in addition to spatial index allows maps to be placed in correct time slot – Used publication date as survey date metadata missing – An example of a MapServer layer definition for 1890s maps: area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a, (selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version = 'ng' or version = 'cs_ng') and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition = sheet_group.edition2) as subq using unique id using SRID=27700
  • 13. • Ease of use with range of map rendering software OS Strategi (Cadcorp GeognoSIS) OS Open Data: Panorama and Vector Map District products plus grid lines and labels (MapServer)
  • 14. … for WMS GetFeature Info • Easy to provide information about selected feature. • Allow use of additional search parameters, for example proximity to point clicked. • Access additional metadata tables for Example of proximity information. search (especially useful for point data) Map sheet information stored in metadata tables. Bedrock information and selected area highlighted.
  • 15. … update interfaces to reflect current map Legend shows only rock types in area (over 1000 in full legend) Timeline highlights selected as well as other available decades
  • 16. … as a ‘Data Store’ for download UKBORDERS provides bespoke data extraction of vector boundary data in custom formats (Shape, MIF,KML,DXF) Realtime extraction - uses Geoserver over PostGIS as WFS piped through FME Metamodel built around PostGIS (formerly Oracle). Migration resulted in a more scalable (multiple dev/live/fallover instances) with easier desktop prototyping OpenBoundaries – same engine, different data (all based around derived OS Open Data) and skin
  • 17. … for querying • Unlock provides an Application Programming Interface (API) for querying over 11 million geographic names across variety of gazetteers: • GeoNames (world coverage) • Pleiades ancient place names (world coverage) • Natural Earth (world coverage) • OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary- Line, BN Grid references • Placename outlines and attribution extracted from mapping data or published gazetteers • Outlines are unique service feature enabling further spatial data extraction and analysis • Unlock Places extensively uses stored database procedures: • The writing of dynamic queries. • Allowing complex data filtering and parsing.
  • 18. Outline of Southampton returned by Unlock Places
  • 19. How do we use Postgres/PostGIS to best effect • Ensure data schemas are determined by functionality – Do NOT accept defaults from loaders – Use INTs for primary selection attributes • Tailor data processing to task – For mapping do NOT include non-mapped features or attributes • Indexes are your friend – Ensure all search attributes are indexed • Clustered indexes are your best pal – Critical for our mapping schemas • Bad or unnecessary indexes are your worst enemy – Can cause sever slowdown resulting in a bad user experience – Make use of EXPLAIN
  • 20. • Hide internal complexity behind database views – makes applications more portable • Use schemas to roll out data updates (just set search path to look in new default schema), makes rolling back to previous data version easy. • Take advantage of stored procedures. If SQL is hidden in application code then it might be impossible to roll out changes instantly because of the need to re-compile, re-deploy the application, downtime might be required  By storing SQL within procedures any changes become immediate and more seamless. • Use built in data replication per instance – feel more protected from bad luck!
  • 21. What we like about Postgres/PostGIS • Reliable ..and the elephants ... • Performant • Scalable • Easier replication • Standards compliant • Comes with good tools • Superb 3rd party support
  • 22. The future: What are we planning? • Migrating to Postgres 9.1 – Currently we have a mix of 8.3 and 8.4 installs – Take advantage of new functionality and bug fixes • Exploring the new functionality in PostGIS 2.0 to enhance existing services and possible new ones – Raster capabilities – Topology – Generalisation with Highly generalised Census topological consistency 2001 OAs in Nottingham. all input features are constraints present post generalisation with no overlaps or new slivers introduced.
  • 23. Conclusion • Postgres and PostGIS has been used to power EDINA geo- services for over 8 years • During late 2011 the last major service was migrated. • All geo-services (and some non-geo ones!) at EDINA rely on Postgres/PostGIS as either the sole or principal database • It will continue to form the core of our services for the foreseeable future. • The elephant is our friend, it certainly could be yours!

Editor's Notes

  1. 1