SlideShare a Scribd company logo
Force 11 Scholarly Communications Institute
Summer School
31 July – 4 August 2017
University of California, San Diego
Data in the Scholarly Communications
Lifecycle
Natasha Simons
Senior Research Data Specialist
Wednesday 2 August
Session two – persistent identifiers for research (data)
• Why do we need PiDs?
• What are PiDs?
• Why use PiDs?
• Why are there so many PiDs?
• Examples: Handles, DOIs, ORCIDs
• Which PiD to choose?
• Power of linking PiDs
• PiD fails
• PiD community
Duration: 30 mins
What’s the problem?
What are Persistent Identifiers (PiDs)?
A persistent identifier is a long–lasting reference
to a digital resource
Photo attribution: Jan Hettenhausen - j.hettenhausen@griffith.edu.au (reproduced with permission)
Use PiDs to connect…
Researchers Publications
Data
Software
Methods
Equipment
???
Why use PiDs?
PiDs play a key role in the discoverability,
accessibility and reproducibility of research.
Why are there so many PiDs?
Marked by differences in:
• Purpose
• Scope
• Underlying technology
• Governance and social infrastructure
• Metadata collected
• Cost
• Extent of use
ARK
PURL
NLA party ID
Example: The Handle System
• Run by CNRI
• Robust system
• Widely used in publication repositories
• Used to identify research datasets
How do Handles work?
Example: http://hdl.handle.net/11343/130078
http://handle.net = resolver service
/
11343 = prefix identifying assigning body (Uni Melb)
/
130078 = suffix identifying resource (Melb Uni report)
Example: Digital Object Identifiers (DOIs)
• Run by international DOI Foundation
• Robust – built on the Handle System
• Origins in publishing industry
• Used to identify and cite publications and
research datasets
• The most widely used PiD for research data
How do DOIs work?
This is an example from Griffith University:
http://doi.org = resolver service
/
10.4225 = prefix identifying the assigning body (ANDS)
/
01 = Suffix 1 – the institution identifier (Griffith University)
/
4F3DB08617645 = Suffix 2 – the resource item or collection
identifier (a dataset held in the Griffith data repository)
More about DOIs
• Metadata required! Example: DataCite Metadata Schema
https://schema.datacite.org/
• DOI search services e.g. DataCite
https://search.datacite.org/
• Cost involved but some agencies like ANDS offer a free
service
• To get a DOI through the ANDS service: m2m or manual
minting
Example: ORCIDs
• Run by ORCID organisation
• Identifier for people (researchers)
• Links people with their research ‘works’
• Widely used internationally
• Australian research sector-wide endorsement
• Embedded in scholarly workflows
How do ORCIDs work?
https://orcid.org/0000-0003-0635-1998
• 16 digit identifier based on ISNI block
• Prototype: Thomson Reuters ResearcherID
• Most metadata fields are optional
• Free for researchers, fee for members
(organisations)
• Public API (free) and premium API
(members)
• Transparent governance and development
process
The power of linking PiDs
• International efforts to link ORCIDs
(researchers) with DOIs (publications and
data)
• The Scholix initiative:
• a global framework to improve the links
between publications and data
• beneficial for all, especially publishers
(display this link in journals) and
repositories (link back to data held in
repositories)
Which PiD to choose?
Evaluate the PiD service:
• Purpose
• Scope
• Underlying technology
• Governance and social
infrastructure
• Metadata collected
• Cost
• Extent of use
• Trustworthiness?
Choose the best fit PiD for
the type of resource and it’s
point in the research lifecycle
Better to choose one than
none!
PiDs sound great - but hang on….?
Erm…
• Recent PiD crises: PURL, LSID
• “Zombie PiDs”?
Remember:
• PiDs are both social and technical
systems
• Governance/ organisations can be the
archilles heel of PiD systems
See: Klump, J. & Huber, R., (2017). 20 Years of
Persistent Identifiers – Which Systems are Here
to Stay?. Data Science Journal. 16, p.9.
DOI:http://doi.org/10.5334/dsj-2017-009
Have PiD systems ever failed? What’s the
guarantee they will stay “long lasting”?
Cool and groovy international PiD community
Summary
• PiDs play a key role in the discovery, accessibility and
reproducibility of research.
• There are many PiD systems which vary in purpose, scope,
underlying technology, governance and social infrastructure,
metadata collected, cost, extent of use.
• When evaluating which PiD to assign to a resource, consider:
• The differences above and importantly, trustworthiness
• Better to assign a PiD or more than no PiD at all
• Remember that PiDs are about social as well as technical
infrastructure. It is the responsibility of the PiD owner (e.g. a
university) to update the PiD if the resource location changes.
• PiDs are evolving so get your geek on and join in the discussions!
Want more?
Have a go at:
• Thing 14 – Identifiers and linked data
Read:
• ANDS website for PiD Guides, DOI service, Handle service:
• More about DataCite
• More about ORCID
• ICSU/CODATA Data Science Journal special issue: 20 years of
Persistent Identifiers
Watch:
• ANDS PiDs short bites webinar series
(persistent identifiers playlist) - more to come in this series!
• THOR Project webinar series
With the exception of logos, third party images or where otherwise indicated, this
work is licensed under the Creative Commons Australia Attribution 3.0 Licence.
ANDS is supported by the Australian
Government through the National Collaborative
Research Infrastructure Strategy Program.
Monash University leads the partnership with
the Australian National University and CSIRO.
Natasha Simons
natasha.simons@ands.org.au
Tw: @n_simons
ORCID: https://orcid.org/0000-0003-0635-1998

More Related Content

FSCI Persistent Identifiers

  • 1. Force 11 Scholarly Communications Institute Summer School 31 July – 4 August 2017 University of California, San Diego Data in the Scholarly Communications Lifecycle Natasha Simons Senior Research Data Specialist
  • 2. Wednesday 2 August Session two – persistent identifiers for research (data) • Why do we need PiDs? • What are PiDs? • Why use PiDs? • Why are there so many PiDs? • Examples: Handles, DOIs, ORCIDs • Which PiD to choose? • Power of linking PiDs • PiD fails • PiD community Duration: 30 mins
  • 4. What are Persistent Identifiers (PiDs)? A persistent identifier is a long–lasting reference to a digital resource Photo attribution: Jan Hettenhausen - j.hettenhausen@griffith.edu.au (reproduced with permission)
  • 5. Use PiDs to connect… Researchers Publications Data Software Methods Equipment ??? Why use PiDs? PiDs play a key role in the discoverability, accessibility and reproducibility of research.
  • 6. Why are there so many PiDs? Marked by differences in: • Purpose • Scope • Underlying technology • Governance and social infrastructure • Metadata collected • Cost • Extent of use ARK PURL NLA party ID
  • 7. Example: The Handle System • Run by CNRI • Robust system • Widely used in publication repositories • Used to identify research datasets
  • 8. How do Handles work? Example: http://hdl.handle.net/11343/130078 http://handle.net = resolver service / 11343 = prefix identifying assigning body (Uni Melb) / 130078 = suffix identifying resource (Melb Uni report)
  • 9. Example: Digital Object Identifiers (DOIs) • Run by international DOI Foundation • Robust – built on the Handle System • Origins in publishing industry • Used to identify and cite publications and research datasets • The most widely used PiD for research data
  • 10. How do DOIs work? This is an example from Griffith University: http://doi.org = resolver service / 10.4225 = prefix identifying the assigning body (ANDS) / 01 = Suffix 1 – the institution identifier (Griffith University) / 4F3DB08617645 = Suffix 2 – the resource item or collection identifier (a dataset held in the Griffith data repository)
  • 11. More about DOIs • Metadata required! Example: DataCite Metadata Schema https://schema.datacite.org/ • DOI search services e.g. DataCite https://search.datacite.org/ • Cost involved but some agencies like ANDS offer a free service • To get a DOI through the ANDS service: m2m or manual minting
  • 12. Example: ORCIDs • Run by ORCID organisation • Identifier for people (researchers) • Links people with their research ‘works’ • Widely used internationally • Australian research sector-wide endorsement • Embedded in scholarly workflows
  • 13. How do ORCIDs work? https://orcid.org/0000-0003-0635-1998 • 16 digit identifier based on ISNI block • Prototype: Thomson Reuters ResearcherID • Most metadata fields are optional • Free for researchers, fee for members (organisations) • Public API (free) and premium API (members) • Transparent governance and development process
  • 14. The power of linking PiDs • International efforts to link ORCIDs (researchers) with DOIs (publications and data) • The Scholix initiative: • a global framework to improve the links between publications and data • beneficial for all, especially publishers (display this link in journals) and repositories (link back to data held in repositories)
  • 15. Which PiD to choose? Evaluate the PiD service: • Purpose • Scope • Underlying technology • Governance and social infrastructure • Metadata collected • Cost • Extent of use • Trustworthiness? Choose the best fit PiD for the type of resource and it’s point in the research lifecycle Better to choose one than none!
  • 16. PiDs sound great - but hang on….? Erm… • Recent PiD crises: PURL, LSID • “Zombie PiDs”? Remember: • PiDs are both social and technical systems • Governance/ organisations can be the archilles heel of PiD systems See: Klump, J. & Huber, R., (2017). 20 Years of Persistent Identifiers – Which Systems are Here to Stay?. Data Science Journal. 16, p.9. DOI:http://doi.org/10.5334/dsj-2017-009 Have PiD systems ever failed? What’s the guarantee they will stay “long lasting”?
  • 17. Cool and groovy international PiD community
  • 18. Summary • PiDs play a key role in the discovery, accessibility and reproducibility of research. • There are many PiD systems which vary in purpose, scope, underlying technology, governance and social infrastructure, metadata collected, cost, extent of use. • When evaluating which PiD to assign to a resource, consider: • The differences above and importantly, trustworthiness • Better to assign a PiD or more than no PiD at all • Remember that PiDs are about social as well as technical infrastructure. It is the responsibility of the PiD owner (e.g. a university) to update the PiD if the resource location changes. • PiDs are evolving so get your geek on and join in the discussions!
  • 19. Want more? Have a go at: • Thing 14 – Identifiers and linked data Read: • ANDS website for PiD Guides, DOI service, Handle service: • More about DataCite • More about ORCID • ICSU/CODATA Data Science Journal special issue: 20 years of Persistent Identifiers Watch: • ANDS PiDs short bites webinar series (persistent identifiers playlist) - more to come in this series! • THOR Project webinar series
  • 20. With the exception of logos, third party images or where otherwise indicated, this work is licensed under the Creative Commons Australia Attribution 3.0 Licence. ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program. Monash University leads the partnership with the Australian National University and CSIRO. Natasha Simons natasha.simons@ands.org.au Tw: @n_simons ORCID: https://orcid.org/0000-0003-0635-1998

Editor's Notes

  1. First of all, what’s the problem that persistent identifiers are trying to address? Everyone will be familiar with this –clicking on a web link that takes you either to a ‘page not found’ error page like this one or to content that is actually unrelated to the link you clicked. Both usually happen because a web resource has been moved to another location and you have the old link. A ‘page not found’ error is frustrating and in the context of research, it is disastrous. It means a scholarly resource, which may have been cited, cannot be found, verified, potentially cited again and so on. This is the problem that persistent identifiers are there to address.
  2. A persistent identifier is simply a long–lasting reference to a digital resource. Even if the resource moves location on the web, the persistent identifier is there to make sure the link always resolves. So if a PiD is used as a citation link in scholarly literature, it will always resolve to information about the resource (either a descriptive metadata page, the resource itself, or information about the removal of the resource from the web). PiDs are key to facilitating the discovery of scholarly resources like journal articles and research data. They also play a role in linking scholarly resources (e.g. publications and data) as well as tracking the impact of these resources. It’s important to note that PiDs do not guarantee a link will never be broken but they create a technical and social framework which helps to guarantee it.
  3. PiDs play a key role in the discoverability, accessibility and reproducibility of research How do you they do this? Provide social and technical infrastructure to identify a research output over time Enable machine readability Apply to a variety of research objects and related “things” – researchers, institutions, outputs Enable research objects and things to be labeled uniquely and disambiguates one from another Facilitate the linking of research objects, related people and things so a reader may discover a publication, it’s related dataset, software, methods etc. PiDs are an integral part of the semantic web
  4. So why are there so many PiD systems? Well, each PiD systems is different from another. They vary in: Purpose – for example they can general – all scholarly resource types e.g. DOIs, OR discipline specific e.g. Life Sciences ID Underlying technology (more on this shortly) Governance e.g. non-profit, cross-sector collaboration effort or company-driven Metadata collected – some require more than others Cost – some are free, some not Extent of use – PiDs vary in uptake
  5. Most PiDs for research work by separating the identity of a scholarly object from its location on the web Let’s look at the Handle System as an example. Handles are run by the  Corporation for National Research Initiatives (CNRI) in the USA CNRI is a is a not-for-profit organization formed in 1986 to undertake, foster, and promote research in the public interest.  The Handle system is very robust and is widely used internationally among repositories. It also provides the underlying infrastructure for Digital Object Identifiers (DOIs). Characteristics: Central handle registry where handle identifiers are recorded Distributed computer system including handle proxy servers Model: assign one Handle per resource Minimal cost (and this is usually borne by the Handle issuer such as an institution running a Handle proxy server) Unique, global, scalable, reliable Note: PiDs are both technical AND social infrastructure, so If URL of a resource changes then the owner must update the URL in the Handle system
  6. The Handle identifier is made up of: a prefix that identifies the “naming authority” a suffix that identifies the “local name” of the resource a resolver service: http://hdl.handle.net
  7. Let’s look at another example of persistent identifiers: DOIs These came from the scholarly publishing industry. DOIs are routinely assigned by publishers to identify journal articles and other published works. There is a great deal of technical and social infrastructure invested in DOIs and according to recent research by the THOR project they are by far the most widely used persistent identifier for research objects including research data. DOIs are: An implementation of The Handle System Applicable to a variety of digital objects e.g. in research: publications, data, software, methods, “grey literature”, theses etc. Governed by the International DOI Foundation which is a not-for-profit organisation DOIs are issued by DOI Registration Agencies or their agents CrossRef: scholarly publications DataCite: datasets, software, “grey literature” Agent examples: EZID CDL, BL, ANDS Unique, global, scalable, reliable
  8. Like the Handle system it is built on, DOIs have: a prefix that identifies the “naming authority” a suffix that identifies the “local name” of the resource a resolver service
  9. More metadata is required to mint a DOI than a Handle. For Handles - you can get away with pretty much just the URL and Title of the resource. For DOIs – much more is required and there are many optional and recommended metadata elements as well DataCite schema example – 6 mandatory, 6 recommended, and 6 optional fields Because more metadata is collected, DataCite also offer a search service – all datasets, software, grey literature etc minted with a DOI in the one search portal Cost – minimal but may be covered by the DOI agent e.g. ANDS For the ANDS service: accessed by institutions not individuals – m2m and manual options for minting and managing DOIs Similar to Handles, DOIs require a commitment from the owner of the resource to manage updates to the location of the resource within the DOI infrastructure e.g. if it moves location, update DOI. If it is removed, update DOI with location of your tombstone record You can see from this that persistent identifiers do not guarantee the long life of the resource itself, they work to guarantee the long life of access to information about the resource
  10. An example of a different type of identifier is ORCID. ORCID is: An identifier for people Enables researchers to uniquely and unambigualously identify themselves from other researchers with the same name AND link all of their scholarly works in the one record regardless of the work type (important for credit and attribution) a not-for-profit organisation supported by members unique, global, scalable, reliable collect metadata – majority optional have Australian research sector-wide endorsement (plus a consortium) Have fast become the international standard for research identifiers – embedded into scholarly publishing workflows, endorsed and supported by every stakeholder in the research sector
  11. 16 digit identifier based on ISNI block Prototype: Thomson Reuters ResearcherID Most metadata optional: Some synched to record from systems like CrossRef and DataCite Some manual input Free for researchers, fee for members (organisations) Public API (free) and premium API (members) Transparent governance and development process (see public Trello boards)
  12. Linking persistent identifiers plays a key role in research reproducibility, discovery and accessibility That’s why there are international efforts to do this Two I will mention here: THOR project based in Europe has undertaken efforts to link ORCIDs with DOIs Scholix initiative is new and comes out of the World Data Service and the Research Data Alliance: involves publishers and infrastructure providers; provides a global framework to improve links between publications and data beneficial for all, especially publishers (display this link in journals) and repositories (link back to data held in repositories) More info on ANDS website and recent webinar on this topic
  13. Evaluate the PiD service: Purpose Scope Underlying technology Governance and social infrastructure Metadata collected Cost Extent of use Trustworthiness? Choose the best fit PiD for the type of resource and it’s point in the research lifecycle Better to choose one than none! A resource – over time – may even get 2 PiDs and in the future these PiDs may be linked via a provenance trail e.g. this dataset had a handle and now it has a DOI. When minting the PiD, include as much metadata as you can More metadata helps linkages, attribution and discovery More metadata helps linkages, attribution and discovery
  14. PiD crisis: PURL – introduced by OCLC and there are over 16,000 PURLs in Google Scholar. Around 2015 OCLC lost interest, tech freeze about 18 months, eventually Internet Archive took over and has brought PURL back from the brink LSID – strongly supported by biodiversity informatics communities standardisation authority but in recent years the technology was the topic of hot debate and the system came into crisis. Maintenance was terminated and a resolver made available in the interim as discussions continue. Meanwhile about 14,000 LSIDs are listed in Google Scholar and their future is in doubt Many PID systems were developed by various communities and, for different reasons, have failed to withstand the test of time, eventually sliding into paralysis and what Jens Klump from CSIRO calls a ‘zombie’ stage where identifiers continue to exist but the PID system loses its resolution service. Klump and his colleague Huber suggest PiD governance is the key – they suggest PiD systems have exit plans and that a universal evaluation criteria be developed for assessing PiD systems.
  15. There is a cool and groovy international PiD community and a lot if happening: PiDapalooza – 2 day “festival” (conference) of everything PiD related – was in Iceland last year and on again next January in California THOR project funded by European Commission Goal: every researcher has seamless access to persistent identifiers and works will be uniquely attributed to them Nice work done on PiD usage statistics and targetted uptake of PiDs for research Fantastic webinar series CrossRef, DataCite, ORCID collaboration project on persistent identifiers for organisations and other PiD related collaborations International RDA PiD Interest Group New PiD on the Block in Australia: RAID – Research Activity Identifier built on the ANDS Handle service to identify activity as it happens at different points in the research lifecycle. First customer will be University of Queensland.
  16. If you’ve found this talk exciting, come join me and be a PiD nerd too! Here are some resources in the slides that you can access to get you started Even if you don’t want to join the legions of PiD nerds, I hope you have all learned something from this talk, thanks for listening