SlideShare a Scribd company logo
Wide access to spatial
Citizen Science data
ECSA 2016, Berlin
Paul van Genuchten, Lieke Verhelst, Clemens Portele
Wide access to spatial Citizen Science data - ECSA Berlin 2016
About the authors
Paul van Genuchten is a software engineer at “GeoCat BV”, supporting
governments to publish (spatial/open) data on the web.
Lieke Verhelst is owner of “Linked Data Factory”. Lieke is a linked data expert
and has developed multiple ontologies in the scope of food-safety, soil science,
nature reserves, water management
Clemens Portele is managing director of “interactive instruments GmbH”.
interactive instruments is a software engineering company in the spatial data
infrastructure domain and is an active contributor to multiple OGC standards.
COBWEB
COBWEB is a research project to empower citizens with the ability to collect
environmental information using mobile devices, which will then be made suitable
for use in research, decision making and policy formation.
GeoCat improves GeoNetwork opensource, targeting citizen science data
discovery and visualisation in the scope of the COBWEB FP7 project.
The project has received funding from the European Union under grant agreement
No 308513
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
The open data challenges
- Discovery; people can’t find the data
- Format; the data is exposed in complex services/formats
- License; the license is restrictive
- Aggregation level; “raw data now” *
* Rufus Pollock, 2007 http://blog.okfn.org/2007/11/07/give-us-the-data-raw-and-give-it-to-us-now/
Background
One of the objectives of COBWEB is to publish citizen science data to GEOSS
GEOSS has a focus on spatial standards (CSW, SensorWeb, WMS/WFS)
Major part of citizen science community is not aware of these standards
Average users use search engines to discover data and common formats to
analyse data
How to bridge the gap between services in GEOSS and search engines
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Geonovum testbed
The gap between OGC and WEB standards is a general challenge
W3C and OGC have set up a joint working group to develop best practices
At the start of 2016 Geonovum (dutch national government) organised a testbed to
move the ‘spatial data on the web’ best practices forward.
What search engines expect
HTML (text) output on unique persistent url’s
An index that lists links to all url’s to discover
HTML documents annotated with “schema.org”-markup transform web pages into
structured data
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Schema.org and Citizen Science
The Schema.org ontology currently does not provide classes for citizen science
projects and observations
An extension to schema.org can be proposed to model citizen science
communities and observations, for example based on schema.org/Measurement
Wide access to spatial Citizen Science data - ECSA Berlin 2016
A proxy approach
A proxy layer transforms WFS/CSW requests to HTML annotated with schema.org
The CSW proxy approach is implemented in GeoNetwork opensource
For the WFS proxy approach a new open source product has been released by
interactive instruments, called ‘LDproxy’
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Wide access to spatial Citizen Science data - ECSA Berlin 2016
{image of google structured data testing tool}
Wide access to spatial Citizen Science data - ECSA Berlin 2016
A proxy approach to reach other communities
A similar approach can be used to expose OGC services to other communities,
such as citizen science developer community
- CSW/iso19139 metadata exposed as DCAT/VOID in RDFa or rdf/xml
- SOS/WFS/GML exposed as Darwin Core in RDFa or json-ld
- A json API for web developers
Also interesting would be to look at a vice versa approach, in which a proxy is
used to expose unstructured citizen science data to the geoss community as
WFS/SOS.
Privacy and the search engines
Some of the search engines are generally percieved as a challenge for privacy
However; in this case it is the campaign organiser that should take measures
A complicating factor is that citizens tend to like to advertise that they made a
contribution, or even claim ownership of a contribution
Privacy by design
Minimise the transport and storage (timespan) of data that could be used to derive
identity (minimise, separate, aggregate & hide*)
Communicate transparently about the transport and storage strategy
Offer users the ability to review and remove their personal data
Transport a location/timestamp to the level of detail that is required for the use
case
Use a wallet with reliability-credits instead of keeping a user history for reliability
assessment
* https://www.pilab.nl/wp-content/uploads/2013/12/Privacy-design-strategies-JHH-5-12-2013.pdf
“Privacy awareness is growing,
it’s comparable with the stage of environmental awareness 40 years ago” *
*Jaap-Henk Hoepman, Privacy & Identity Lab, Radboud University Nijmegen
Conclusions
A proxy approach for CSW is a good way to make existing published datasets
more widely discoverable via alternative channels
A proxy approach for WFS/SOS has potential to bridge the gap between OGC
services and search engines, however currently the search engines have limited
implementations for using the schema.org annotations
Adopting an established standard helps in making data more widely available.
There’s a growing number of tools available to facilitate to engage with open data

More Related Content

Wide access to spatial Citizen Science data - ECSA Berlin 2016

  • 1. Wide access to spatial Citizen Science data ECSA 2016, Berlin Paul van Genuchten, Lieke Verhelst, Clemens Portele
  • 3. About the authors Paul van Genuchten is a software engineer at “GeoCat BV”, supporting governments to publish (spatial/open) data on the web. Lieke Verhelst is owner of “Linked Data Factory”. Lieke is a linked data expert and has developed multiple ontologies in the scope of food-safety, soil science, nature reserves, water management Clemens Portele is managing director of “interactive instruments GmbH”. interactive instruments is a software engineering company in the spatial data infrastructure domain and is an active contributor to multiple OGC standards.
  • 4. COBWEB COBWEB is a research project to empower citizens with the ability to collect environmental information using mobile devices, which will then be made suitable for use in research, decision making and policy formation. GeoCat improves GeoNetwork opensource, targeting citizen science data discovery and visualisation in the scope of the COBWEB FP7 project. The project has received funding from the European Union under grant agreement No 308513
  • 10. The open data challenges - Discovery; people can’t find the data - Format; the data is exposed in complex services/formats - License; the license is restrictive - Aggregation level; “raw data now” * * Rufus Pollock, 2007 http://blog.okfn.org/2007/11/07/give-us-the-data-raw-and-give-it-to-us-now/
  • 11. Background One of the objectives of COBWEB is to publish citizen science data to GEOSS GEOSS has a focus on spatial standards (CSW, SensorWeb, WMS/WFS) Major part of citizen science community is not aware of these standards Average users use search engines to discover data and common formats to analyse data How to bridge the gap between services in GEOSS and search engines
  • 15. Geonovum testbed The gap between OGC and WEB standards is a general challenge W3C and OGC have set up a joint working group to develop best practices At the start of 2016 Geonovum (dutch national government) organised a testbed to move the ‘spatial data on the web’ best practices forward.
  • 16. What search engines expect HTML (text) output on unique persistent url’s An index that lists links to all url’s to discover HTML documents annotated with “schema.org”-markup transform web pages into structured data
  • 19. Schema.org and Citizen Science The Schema.org ontology currently does not provide classes for citizen science projects and observations An extension to schema.org can be proposed to model citizen science communities and observations, for example based on schema.org/Measurement
  • 21. A proxy approach A proxy layer transforms WFS/CSW requests to HTML annotated with schema.org The CSW proxy approach is implemented in GeoNetwork opensource For the WFS proxy approach a new open source product has been released by interactive instruments, called ‘LDproxy’
  • 26. {image of google structured data testing tool}
  • 28. A proxy approach to reach other communities A similar approach can be used to expose OGC services to other communities, such as citizen science developer community - CSW/iso19139 metadata exposed as DCAT/VOID in RDFa or rdf/xml - SOS/WFS/GML exposed as Darwin Core in RDFa or json-ld - A json API for web developers Also interesting would be to look at a vice versa approach, in which a proxy is used to expose unstructured citizen science data to the geoss community as WFS/SOS.
  • 29. Privacy and the search engines Some of the search engines are generally percieved as a challenge for privacy However; in this case it is the campaign organiser that should take measures A complicating factor is that citizens tend to like to advertise that they made a contribution, or even claim ownership of a contribution
  • 30. Privacy by design Minimise the transport and storage (timespan) of data that could be used to derive identity (minimise, separate, aggregate & hide*) Communicate transparently about the transport and storage strategy Offer users the ability to review and remove their personal data Transport a location/timestamp to the level of detail that is required for the use case Use a wallet with reliability-credits instead of keeping a user history for reliability assessment * https://www.pilab.nl/wp-content/uploads/2013/12/Privacy-design-strategies-JHH-5-12-2013.pdf
  • 31. “Privacy awareness is growing, it’s comparable with the stage of environmental awareness 40 years ago” * *Jaap-Henk Hoepman, Privacy & Identity Lab, Radboud University Nijmegen
  • 32. Conclusions A proxy approach for CSW is a good way to make existing published datasets more widely discoverable via alternative channels A proxy approach for WFS/SOS has potential to bridge the gap between OGC services and search engines, however currently the search engines have limited implementations for using the schema.org annotations Adopting an established standard helps in making data more widely available. There’s a growing number of tools available to facilitate to engage with open data