SlideShare a Scribd company logo
Chalice – Linked Data and Historic Place-names Jo Walsh  [email_address] Kate Byrne, Richard Tobin, Claire Grover
Overview of the Edinburgh Geoparser System to automatically recognise place names in text and disambiguate them with respect to a gazetteer. (Athens, Springfield) Patchy development over past few years funded by a variety of projects applied to a range of data sets: GeoCrossWalk BOPCRIS GeoDigRef (Histpop, BOPCRIS, BL) Embedding GeoCrossWalk (Stormont Papers) SYNC3 (online news) Chalice (EPNS) Unlock Main concern has been to keep it generally usable while applying it to specific data sets.
Overview of the Edinburgh Geoparser .txt .html .xml Format  conversion Tokenisation POS tagging Lemmatis- ation Named Entity Recognition .geotagged.xml Geotagging Gazetteer lookup Resolution .geotagged.xml .gaz.xml Georesolution
 
 
Chalice Connecting Historical Authorities with Linked Data, Contexts, and Entities. Part of jiscEXPO - "exposing digital content for education and research".  The project is exploring the viability of creating a historical gazetteer from digitized volumes from the English Place-Name Society (EPNS). Partners:  CDDA, Queen’s University, Belfast School of Informatics, Edinburgh EDINA, Edinburgh CeRch, Kings College London
English Place-Name Survey At the Institute of Name Studies in Nottingham 80+ volumes covering English counties Over 1000 years of place-name history Started in 1925 and still going!
 
Archaeology and Place-names and History "The first point, already noted repeatedly but so important that it cannot be too strongly emphasised, is that historical evidence is documentary and therefore direct evidence only of a state of mind; that archaeological evidence is material and therefore direct evidence only of practical skills, technological processes, aesthetic interests and physical sequences; and that place-name evidence is linguistic and therefore direct evidence only of language and speech habits. Indirect inferences may be drawn in each case, and  the evidence of place-names may be used to throw light on the date, nature and extent of settlements, on the movements of peoples and their relationships to each other , on certain aspects of their organisation and on many of the other problems that concern the historian and the archaeologist. But in all these cases the inferences depend to some extent on assumptions and they must be examined carefully before they are accepted as valid."  – F.T. Wainwright
Chalice data Cheshire Cheshire Part I. EPNS Volume 44, 1970 Cheshire Part II. EPNS Volume 45, 1970 Cheshire Part III. EPNS Volume 46, 1971 Cheshire Part IV. EPNS Volume 47, 1972 Cheshire Part V (1 :i). EPNS Volume 48, 1981 Cheshire Part V (1 :ii). EPNS Volume 54, 1981 Small samples from: Berkshire, Buckinghamshire (Vol. 2), Cambridgeshire (Vol 19), Derbyshire (Vols 27-29), Hertfordshire (Vol. 15)  Shropshire: Pimhill Hundred (born digital)
EPNS Parishes organised in terms of the hundreds in which they belong. Towns and villages referred to as townships, organised in terms of the parish in which they belong. Township descriptions often contain descriptions of buildings, bridges, lanes, woods and farms.  Information about river and major road names are described separately from the inhabited place descriptions.  Names and spellings that have been attested in historical sources and the etymology of names or name parts. In Chalice we focus on capturing parishes, townships, sub-townships, attestation.
 
The start of the entry for the township of Willaston in the parish of Neston in Wirral Hundred.
 
 
 
 
Turtle-ish version @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .  @prefix gn: <http://www.geonames.org/ontology#> .  @prefix dc: <http://purl.org/dc/elements/1.1/#>  @prefix chalice: <http://made.up.domain.name/chalice/>  :Bosley a chalice:Place;  dc:title Bosley . owl:sameas <http://sws.geonames.org/2655141/> :Boselega a chalice:PlaceName;  dc:title Boselega . #attested a chalice:PlaceNameAttestation;  chalice:place :Bosley ;  chalice:known_as :Boselega ;  chalice:source :DB ;  chalice:date 1086 .  :DB a chalice:Source dc:title 'Domesday Book' .
Linking Data A URI for each place-name Links to information about each attestation Links to nearby places Links to other sources of place-name references Geonames.org (variable quality, wide usage) Ordnance Survey Open Data (also variable quality) Then links from and between documentary sources
 
 
 
Issues OCR quality needs to be high: not just recognising characters correctly but getting font and layout information right.  Variation in use of layout and font to indicate structure Different volumes reflect different decisions about where place name information should be put Consider long-term preservation of URIs  Need to share vocabularies with other projects  (Pleaides, SPQR, geodataverse?)
Integrating (with) other sources Series of use cases by Stuart Dunn at KCL Victoria County History Clergy of the Church of England Database Archaeology Data Service
 
 
 
 
GAP & Ancient Place-names Based on Pleiades set of ancient place names but extended in two ways: by matching Pleiades place names against GeoNames place names in the same location and adding the GeoNames alternative names to the Pleiades+ list: adds three alternative names for the single Pleiades entry for &quot;Autricum&quot; (&quot;Chartrez&quot;, &quot;Chartres&quot;, &quot;Shartr&quot;), because &quot;Autricum” is present in both Pleiades and GeoNames, with the same approximate location (We don't want to simply take places directly from GeoNames because, when we tried it, we were swamped with irrelevant modern places having names corresponding to ancient toponyms.)
Pleiades+(+) Pleiades+: get alternative names for places that match in geonames Pleiades++ is a runtime supercharging bit: if place X isn't in Pleiades+, look at &quot;synonym ring&quot; of alternative names in geonames try all of those against Pleiades+ mysql> select distinct p.name,p.plid,p.geonameId,p.fclass,p.fcode,p.country,p.latitude,p.longitude,p.population,p.normname from plplus p join geonames.alternatename a on p.name=a.alternatename join geonames.geoname g on a.geonameid=g.geonameid join geonames.alternatename a2 on a2.geonameid=g.geonameid where a2.alternatename=&quot;Egypt&quot;; +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | name     | plid    | geonameId | fclass | fcode | country | latitude   | longitude  | population | normname | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | Aegyptus |     766 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aegyptus | | Aegyptus |  981503 |         0 |        |       |         | 27.5000000 | 26.5476190 |          0 | aegyptus | | Aigyptos | 1001943 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aigyptos | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ 3 rows in set (0.05 sec)
Thanks http://chalice.blogs.edina.ac.uk http://unlock.edina.ac.uk/text.html

More Related Content

Chalice / Edinburgh Geoparser at GISRUK

  • 1. Chalice – Linked Data and Historic Place-names Jo Walsh [email_address] Kate Byrne, Richard Tobin, Claire Grover
  • 2. Overview of the Edinburgh Geoparser System to automatically recognise place names in text and disambiguate them with respect to a gazetteer. (Athens, Springfield) Patchy development over past few years funded by a variety of projects applied to a range of data sets: GeoCrossWalk BOPCRIS GeoDigRef (Histpop, BOPCRIS, BL) Embedding GeoCrossWalk (Stormont Papers) SYNC3 (online news) Chalice (EPNS) Unlock Main concern has been to keep it generally usable while applying it to specific data sets.
  • 3. Overview of the Edinburgh Geoparser .txt .html .xml Format conversion Tokenisation POS tagging Lemmatis- ation Named Entity Recognition .geotagged.xml Geotagging Gazetteer lookup Resolution .geotagged.xml .gaz.xml Georesolution
  • 4.  
  • 5.  
  • 6. Chalice Connecting Historical Authorities with Linked Data, Contexts, and Entities. Part of jiscEXPO - &quot;exposing digital content for education and research&quot;. The project is exploring the viability of creating a historical gazetteer from digitized volumes from the English Place-Name Society (EPNS). Partners: CDDA, Queen’s University, Belfast School of Informatics, Edinburgh EDINA, Edinburgh CeRch, Kings College London
  • 7. English Place-Name Survey At the Institute of Name Studies in Nottingham 80+ volumes covering English counties Over 1000 years of place-name history Started in 1925 and still going!
  • 8.  
  • 9. Archaeology and Place-names and History &quot;The first point, already noted repeatedly but so important that it cannot be too strongly emphasised, is that historical evidence is documentary and therefore direct evidence only of a state of mind; that archaeological evidence is material and therefore direct evidence only of practical skills, technological processes, aesthetic interests and physical sequences; and that place-name evidence is linguistic and therefore direct evidence only of language and speech habits. Indirect inferences may be drawn in each case, and the evidence of place-names may be used to throw light on the date, nature and extent of settlements, on the movements of peoples and their relationships to each other , on certain aspects of their organisation and on many of the other problems that concern the historian and the archaeologist. But in all these cases the inferences depend to some extent on assumptions and they must be examined carefully before they are accepted as valid.&quot; – F.T. Wainwright
  • 10. Chalice data Cheshire Cheshire Part I. EPNS Volume 44, 1970 Cheshire Part II. EPNS Volume 45, 1970 Cheshire Part III. EPNS Volume 46, 1971 Cheshire Part IV. EPNS Volume 47, 1972 Cheshire Part V (1 :i). EPNS Volume 48, 1981 Cheshire Part V (1 :ii). EPNS Volume 54, 1981 Small samples from: Berkshire, Buckinghamshire (Vol. 2), Cambridgeshire (Vol 19), Derbyshire (Vols 27-29), Hertfordshire (Vol. 15) Shropshire: Pimhill Hundred (born digital)
  • 11. EPNS Parishes organised in terms of the hundreds in which they belong. Towns and villages referred to as townships, organised in terms of the parish in which they belong. Township descriptions often contain descriptions of buildings, bridges, lanes, woods and farms. Information about river and major road names are described separately from the inhabited place descriptions. Names and spellings that have been attested in historical sources and the etymology of names or name parts. In Chalice we focus on capturing parishes, townships, sub-townships, attestation.
  • 12.  
  • 13. The start of the entry for the township of Willaston in the parish of Neston in Wirral Hundred.
  • 14.  
  • 15.  
  • 16.  
  • 17.  
  • 18. Turtle-ish version @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix gn: <http://www.geonames.org/ontology#> . @prefix dc: <http://purl.org/dc/elements/1.1/#> @prefix chalice: <http://made.up.domain.name/chalice/> :Bosley a chalice:Place; dc:title Bosley . owl:sameas <http://sws.geonames.org/2655141/> :Boselega a chalice:PlaceName; dc:title Boselega . #attested a chalice:PlaceNameAttestation; chalice:place :Bosley ; chalice:known_as :Boselega ; chalice:source :DB ; chalice:date 1086 . :DB a chalice:Source dc:title 'Domesday Book' .
  • 19. Linking Data A URI for each place-name Links to information about each attestation Links to nearby places Links to other sources of place-name references Geonames.org (variable quality, wide usage) Ordnance Survey Open Data (also variable quality) Then links from and between documentary sources
  • 20.  
  • 21.  
  • 22.  
  • 23. Issues OCR quality needs to be high: not just recognising characters correctly but getting font and layout information right. Variation in use of layout and font to indicate structure Different volumes reflect different decisions about where place name information should be put Consider long-term preservation of URIs Need to share vocabularies with other projects (Pleaides, SPQR, geodataverse?)
  • 24. Integrating (with) other sources Series of use cases by Stuart Dunn at KCL Victoria County History Clergy of the Church of England Database Archaeology Data Service
  • 25.  
  • 26.  
  • 27.  
  • 28.  
  • 29. GAP & Ancient Place-names Based on Pleiades set of ancient place names but extended in two ways: by matching Pleiades place names against GeoNames place names in the same location and adding the GeoNames alternative names to the Pleiades+ list: adds three alternative names for the single Pleiades entry for &quot;Autricum&quot; (&quot;Chartrez&quot;, &quot;Chartres&quot;, &quot;Shartr&quot;), because &quot;Autricum” is present in both Pleiades and GeoNames, with the same approximate location (We don't want to simply take places directly from GeoNames because, when we tried it, we were swamped with irrelevant modern places having names corresponding to ancient toponyms.)
  • 30. Pleiades+(+) Pleiades+: get alternative names for places that match in geonames Pleiades++ is a runtime supercharging bit: if place X isn't in Pleiades+, look at &quot;synonym ring&quot; of alternative names in geonames try all of those against Pleiades+ mysql> select distinct p.name,p.plid,p.geonameId,p.fclass,p.fcode,p.country,p.latitude,p.longitude,p.population,p.normname from plplus p join geonames.alternatename a on p.name=a.alternatename join geonames.geoname g on a.geonameid=g.geonameid join geonames.alternatename a2 on a2.geonameid=g.geonameid where a2.alternatename=&quot;Egypt&quot;; +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | name     | plid    | geonameId | fclass | fcode | country | latitude   | longitude  | population | normname | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | Aegyptus |     766 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aegyptus | | Aegyptus |  981503 |         0 |        |       |         | 27.5000000 | 26.5476190 |          0 | aegyptus | | Aigyptos | 1001943 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aigyptos | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ 3 rows in set (0.05 sec)