1) Postgres and PostGIS have been used at EDINA for over 8 years to power major geospatial services like Digimap.
2) It is used for data storage, mapping, spatial indexing, querying, and data downloads. Postgres allows EDINA to handle large amounts of geospatial data and large user bases.
3) EDINA finds Postgres reliable, performant, scalable, and standards-compliant with good support tools. It will continue being the core database for EDINA's geoservices.
2. About - EDINA National Data Centre
• A designated National Data Centre for Tertiary
Education since 1995
• Based at The University of Edinburgh
• Our mission...
to enhance the productivity of research, learning and
teaching in UK higher and further education
BY
delivering access to a range of online data services through a
UK academic infrastructure, as well as supporting knowledge
exchange and ICT capacity building, nationally and
internationally.
• Focus is on service but also undertake r&D
• History
– first online GI service, UKBORDERS, launched in 1994
– flagship Digimap service now a teenager!
– substantial experience in handling geospatial data on a large
scale (large db; large user base)
3. The Geoservices Team
• Largest team within EDINA 1999
• Highly experienced and skilled
team Projects
– provides advice nationally and
internationally
Services
– active in standards development and
policy
– active in GI community nationally and
Today
internationally
Projects
• Demands of the services offered
means the team has been at
leading edge of GI service
Services
development in UK
4. Our Service requirements
• Fast servicing of requests
• Scaleable and extensible
– accommodates steady or increasing demand
• Robust (our SLA aspires to 99% uptime!)
• Maintainable
• Standardised
– can easily substitute components for repair, upgrade,
etc.
• Rapid prototyping and rollout
• All of above on tight budget!
5. What do we use Postgres/PostGIS for?
• Service operation and management
• Map creation
– Data store for vector based maps
– Indexing service for raster based maps
– Source for ‘Get Feature Info’ queries
• Data Delivery
– Data store for vector products
• Searching/Querying
– Advanced place name searching
6. … for service operation and management
• Store service critical metadata
• User data
• Control user access
• Log activity
7. Case Study: Digimap
• Approx 50,000 active users at any point in time
• Academic Year 2010/11 stats
• c400,000 logins
• Over 10 million maps created
• 240,000 high quality print maps generated
• 100,000 data download requests
• Over 1 million data files downloaded
8. … as a ‘Data Store’ for mapping
• From the (very) large
• Ordnance Survey’s MasterMap (in EDINA’s map schema)
Data Rows:
Area: 107,293,931
Lines: 278,110,576
Boundary: 535,039
Points: 3,984,140
Symbols: 2,793,680
Text: 21,004,729
Data Size (indexes):
Area: 49 Gb (13Gb)
Lines: 73 Gb (24Gb)
Boundary: 321 Mb (46 Mb)
Points: 668 Mb (399 Mb)
Symbols: 522 Mb (236 Mb)
Text: 4 Gb (1.7gb)
9. … as a ‘Data Store’ for mapping
• … via the small but cartographically complex
• Ordnance Survey’s Strategi
Only 778,000 rows
Range of geometries
Strict layer draw order
Over 50 layers
Many drawn multiple times
10. … as a ‘Data Store’ for mapping
• … to the complex data schema
• SeaZone’s Hydrospatial
Large range of features
Complex feature relationships
Individual layers scale control
11. … as a ‘Spatial Indexing’ system
• Spatial index for 1.4 million historical maps of Great Britain
• Covers the late 1840s to early 1990s
Complex file structure
Reflects original capture
Counties
Towns
Editions
Scale
And the digitisation process
… but not critically TIME
12. • However, for historical data the temporal availability was
critical.
• Use of date information in addition to spatial index allows
maps to be placed in correct time slot
– Used publication date as survey date metadata missing
– An example of a MapServer layer definition for 1890s maps:
area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a,
(selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as
varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version =
'ng' or version = 'cs_ng') and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and
a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND
(cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group
where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as
varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition =
sheet_group.edition2) as subq using unique id using SRID=27700
13. • Ease of use with range of map rendering software
OS Strategi (Cadcorp GeognoSIS)
OS Open Data: Panorama and Vector
Map District products plus grid lines and
labels (MapServer)
14. … for WMS GetFeature Info
• Easy to provide
information about
selected feature.
• Allow use of additional
search parameters, for
example proximity to
point clicked.
• Access additional
metadata tables for Example of proximity
information. search (especially useful
for point data)
Map sheet information
stored in metadata tables.
Bedrock information and
selected area highlighted.
15. … update interfaces to reflect current map
Legend shows only rock
types in area (over 1000
in full legend)
Timeline highlights selected as well as other available decades
16. … as a ‘Data Store’ for download
UKBORDERS provides bespoke
data extraction of vector
boundary data in custom
formats (Shape, MIF,KML,DXF)
Realtime extraction - uses
Geoserver over PostGIS as WFS
piped through FME
Metamodel built around
PostGIS (formerly Oracle).
Migration resulted in a more
scalable (multiple
dev/live/fallover instances) with
easier desktop prototyping
OpenBoundaries – same
engine, different data (all
based around derived OS
Open Data) and skin
17. … for querying
• Unlock provides an Application Programming Interface (API)
for querying over 11 million geographic names across variety
of gazetteers:
• GeoNames (world coverage)
• Pleiades ancient place names (world coverage)
• Natural Earth (world coverage)
• OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary-
Line, BN Grid references
• Placename outlines and attribution extracted from mapping
data or published gazetteers
• Outlines are unique service feature enabling further spatial
data extraction and analysis
• Unlock Places extensively uses stored database procedures:
• The writing of dynamic queries.
• Allowing complex data filtering and parsing.
19. How do we use Postgres/PostGIS to best effect
• Ensure data schemas are determined by functionality
– Do NOT accept defaults from loaders
– Use INTs for primary selection attributes
• Tailor data processing to task
– For mapping do NOT include non-mapped features or attributes
• Indexes are your friend
– Ensure all search attributes are indexed
• Clustered indexes are your best pal
– Critical for our mapping schemas
• Bad or unnecessary indexes are your worst enemy
– Can cause sever slowdown resulting in a bad user experience
– Make use of EXPLAIN
20. • Hide internal complexity behind database views – makes
applications more portable
• Use schemas to roll out data updates (just set search path to
look in new default schema), makes rolling back to previous
data version easy.
• Take advantage of stored procedures. If SQL is hidden in
application code then it might be impossible to roll out changes
instantly because of the need to re-compile, re-deploy the
application, downtime might be required By storing SQL
within procedures any changes become immediate and more
seamless.
• Use built in data replication per instance – feel more protected
from bad luck!
21. What we like about Postgres/PostGIS
• Reliable
..and the elephants ...
• Performant
• Scalable
• Easier replication
• Standards compliant
• Comes with good tools
• Superb 3rd party support
22. The future: What are we planning?
• Migrating to Postgres 9.1
– Currently we have a mix of 8.3 and 8.4 installs
– Take advantage of new functionality and bug fixes
• Exploring the new functionality in PostGIS 2.0 to enhance
existing services and possible new ones
– Raster capabilities
– Topology
– Generalisation with
Highly generalised Census
topological consistency 2001 OAs in Nottingham.
all input features are
constraints present post generalisation
with no overlaps or new
slivers introduced.
23. Conclusion
• Postgres and PostGIS has been used to power EDINA geo-
services for over 8 years
• During late 2011 the last major service was migrated.
• All geo-services (and some non-geo ones!) at EDINA rely on
Postgres/PostGIS as either the sole or principal database
• It will continue to form the core of our services for the
foreseeable future.
• The elephant is our friend, it certainly could be yours!