SlideShare a Scribd company logo
UKOLN is supported  by: Linked Data – The Future for Open Repositories? 8 th  June 2011 Open Repositories 2011, Austin, Texas, USA Adrian Stevenson LOCAH Project Manager
“ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.” “ the Semantic Web is the goal or end result… Linked Data provides the means to reach that goal” From ‘ Linked Data: The Story So Far ’ - Heath, Bizer and Berners-Lee 2009
The goal of Linked Data is to enable people to share  structured data  on the Web as easily as they can share documents today. Bizer/Cyganiak/Heath Linked Data Tutorial, linkeddata.org
But haven’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not so easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
Data.gov.uk Officially launched 21 st  January 2010
BBC Music
Linked Data Design Issues URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html
URIs and HTTP A ‘Uniform Resource Identifier’ (URI) provides a simple and extensible means for identifying a resource - RFC 3986 HTTP URIs may be ‘de-referenced’ A URL is a type of URI HTTP URIs are used for “real world” things http://adrianstevenson.com/id/me http://dbpedia.org/resource/Love
RDF Resource Description Framework a language for representing information about resources on the Web RDF can be used to represent things  identified  on the Web, even when they cannot be directly  retrieved  on the Web Describes relations using ‘triples’ http://www.w3.org/TR/REC-rdf-syntax/
Triples Triples statements ‘ Things’ have ‘properties’ with ‘values’ Subject – Predicate - Object Triples are the basis of RDF Archival Resource Repository Provides Access To The Rolling Stones Keith Richards Is Member Of
BBC Music
What is the LOCAH Project? L inked  O pen  C opac and  A rchives  H ub Funded by #JiscEXPO 2/10 ‘Expose’ call 1 year project. Started August 2010 http://blogs.ukoln.ac.uk/locah/ tag: #locah
What are the Archives Hub and Copac? Archives Hub is an aggregation of archival descriptions from archive repositories across the UK http://archiveshub.ac.uk Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries http://copac.ac.uk
What is LOCAH Doing? Part 1: Exposing Archives Hub & Copac data as Linked Data Part 2: Creating a prototype visualisation Part 3: Reporting on opportunities and barriers
LOCAH Linked Data If something is identified, it can be linked to We can then take  items from one dataset and link them to items from other datasets BBC VIAF DBPedia Archives Hub Copac GeoNames
Archives Hub Model Archival Resource Finding  Aid EAD  Document Biographical  History Agent  Family  Person  Place  Concept  Genre  Function  Organisation  maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place  topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
HTTP URI Patterns Need to decide on patterns for URIs Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’ http://data.archiveshub.ac.uk/ id /findingaid/gb1086skinner  ‘thing’ URI …  is HTTP 303 ‘See Other’ redirected to … http://data.archiveshub.ac.uk/ doc /findingaid/gb1086skinner  document URI …  which is then content negotiated to … http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .html http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .rdf  http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .turtle http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .json http://www.w3.org/TR/cooluris/ http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
Enhancing our data Already have some links: Language - lexvo.org URIs for languages of archival materials Time - reference.data.gov.uk URIs for time periods Location - using both UK Postcodes URIs and Ordnance Survey URIs  Names - Virtual International Authority File Matches and links widely-used authority files - http://viaf.org/ Names - DBPedia Also looking at: Subjects - Library Congress Subject Headings and DBPedia
http://data.archiveshub.ac.uk/
How are we creating the Visualisation Prototype? Based on researcher use cases Data queried from Sparql endpoint Use tools such as Simile, Many Eyes, Google Charts Also looking at custom built prototype
Visualisation Prototype Using Timemap –  Googlemaps and Simile http://code.google.com/p/timemap / Early stages with this Will give location and ‘extent’ of archive. Will link through to Archives Hub
Data Modelling Complexity Archival description is hierarchical and multi-level Dirty Data Licensing ‘ Ownership’ of data Hard to track attribution CC0 for Archives Hub data Copac license decision in progress
Sustainability Can you rely on data sources long-term?  http://lcsh.info Provenance data ‘watermarked’ <http://data.archiveshub.ac.uk/doc/archivalresource/gb1086skinner> rdf:type foaf:Document Scaling issues
Future for Open Repositories? Repository data can ‘work harder’ New channels into your data New connections with other data sources Researchers are more likely to discover sources  ‘ Hidden' collections of repositories become  of  the Web
Attribution and CC License  Sections of this presentation adapted from materials created by other members of the LOCAH Project This presentation available under creative commons   Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

More Related Content

Linked Data - the Future for Open Repositories?

  • 1. UKOLN is supported by: Linked Data – The Future for Open Repositories? 8 th June 2011 Open Repositories 2011, Austin, Texas, USA Adrian Stevenson LOCAH Project Manager
  • 2. “ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.” “ the Semantic Web is the goal or end result… Linked Data provides the means to reach that goal” From ‘ Linked Data: The Story So Far ’ - Heath, Bizer and Berners-Lee 2009
  • 3. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. Bizer/Cyganiak/Heath Linked Data Tutorial, linkeddata.org
  • 4. But haven’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not so easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
  • 5. Data.gov.uk Officially launched 21 st January 2010
  • 7. Linked Data Design Issues URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html
  • 8. URIs and HTTP A ‘Uniform Resource Identifier’ (URI) provides a simple and extensible means for identifying a resource - RFC 3986 HTTP URIs may be ‘de-referenced’ A URL is a type of URI HTTP URIs are used for “real world” things http://adrianstevenson.com/id/me http://dbpedia.org/resource/Love
  • 9. RDF Resource Description Framework a language for representing information about resources on the Web RDF can be used to represent things identified on the Web, even when they cannot be directly retrieved on the Web Describes relations using ‘triples’ http://www.w3.org/TR/REC-rdf-syntax/
  • 10. Triples Triples statements ‘ Things’ have ‘properties’ with ‘values’ Subject – Predicate - Object Triples are the basis of RDF Archival Resource Repository Provides Access To The Rolling Stones Keith Richards Is Member Of
  • 12. What is the LOCAH Project? L inked O pen C opac and A rchives H ub Funded by #JiscEXPO 2/10 ‘Expose’ call 1 year project. Started August 2010 http://blogs.ukoln.ac.uk/locah/ tag: #locah
  • 13. What are the Archives Hub and Copac? Archives Hub is an aggregation of archival descriptions from archive repositories across the UK http://archiveshub.ac.uk Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries http://copac.ac.uk
  • 14. What is LOCAH Doing? Part 1: Exposing Archives Hub & Copac data as Linked Data Part 2: Creating a prototype visualisation Part 3: Reporting on opportunities and barriers
  • 15. LOCAH Linked Data If something is identified, it can be linked to We can then take items from one dataset and link them to items from other datasets BBC VIAF DBPedia Archives Hub Copac GeoNames
  • 16. Archives Hub Model Archival Resource Finding Aid EAD Document Biographical History Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
  • 17. HTTP URI Patterns Need to decide on patterns for URIs Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’ http://data.archiveshub.ac.uk/ id /findingaid/gb1086skinner ‘thing’ URI … is HTTP 303 ‘See Other’ redirected to … http://data.archiveshub.ac.uk/ doc /findingaid/gb1086skinner document URI … which is then content negotiated to … http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .html http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .turtle http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .json http://www.w3.org/TR/cooluris/ http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • 18. Enhancing our data Already have some links: Language - lexvo.org URIs for languages of archival materials Time - reference.data.gov.uk URIs for time periods Location - using both UK Postcodes URIs and Ordnance Survey URIs Names - Virtual International Authority File Matches and links widely-used authority files - http://viaf.org/ Names - DBPedia Also looking at: Subjects - Library Congress Subject Headings and DBPedia
  • 20. How are we creating the Visualisation Prototype? Based on researcher use cases Data queried from Sparql endpoint Use tools such as Simile, Many Eyes, Google Charts Also looking at custom built prototype
  • 21. Visualisation Prototype Using Timemap – Googlemaps and Simile http://code.google.com/p/timemap / Early stages with this Will give location and ‘extent’ of archive. Will link through to Archives Hub
  • 22. Data Modelling Complexity Archival description is hierarchical and multi-level Dirty Data Licensing ‘ Ownership’ of data Hard to track attribution CC0 for Archives Hub data Copac license decision in progress
  • 23. Sustainability Can you rely on data sources long-term? http://lcsh.info Provenance data ‘watermarked’ <http://data.archiveshub.ac.uk/doc/archivalresource/gb1086skinner> rdf:type foaf:Document Scaling issues
  • 24. Future for Open Repositories? Repository data can ‘work harder’ New channels into your data New connections with other data sources Researchers are more likely to discover sources ‘ Hidden' collections of repositories become of the Web
  • 25. Attribution and CC License Sections of this presentation adapted from materials created by other members of the LOCAH Project This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Editor's Notes

  1. Has been described as a ‘data commons’, or more usually a Web of Data.
  2. http://www.w3.org/DesignIssues/LinkedData.html
  3. Uses predicate logic. Goes back to Aristotle. Conceptualises things, and the relationships between things
  4. Copac a union catalogue Both successful JISC services running for many years now Locah is a research project – will have to see if go into service with LD interface
  5. In hypertext web sites it is considered generally rather bad etiquette not to link to related external material. The value of your own information is very much a function of what it links to, as well as the inherent value of the information within the web page.  So it is also in the Semantic Web. Remember, this is about machines linking – machines need identifiers; humans generally know when something is a place or when it is a person. BBC + DBPedia + GeoNames + Archives Hub + Copac + VIAF = the Web as an exploratory space
  6. 303 and Content Neg from ‘Cool URIs for the Semantic Web’
  7. “ lower level” units interpreted in context of the higher levels of description Arguably “incomplete” without the contextual data. Relations are asserted, e.g. member-of/component-of But there is no requirement or expectation that data consumers will follow the links describing the relations