Ensuring the Integrity (& Continuity)
of Our Record of Scholarship
+ 7 Good Things To Do
‘Standing on the Digits of Giants: Research data, preservation and innovation’
Funding from the Andrew W. Mellon Foundation
Peter Burnhill
EDINA, University of Edinburgh
ALPSP/DPC, London 8th March 2016
Focus on Scholarly Statement =
Content + References to Content
Content Scholarly
References to
 Back into Scholarly
Has ‘fixity’
CLOCKSS, Portico,
E-Journal Archiving
How brittle are those Digits (of Giants) we presume to stand upon?
Focus on Scholarly Statement =
Content + References to Content
Content Scholarly Record
References to
=> Back into Scholarly
=> Out onto
the Web at Large
Has ‘fixity’ dynamic , lacks fixity
CLOCKSS, Portico,
‘Web today, gone tomorrow’
Reference RotE-Journal Archiving
How brittle are those Digits (of Giants) we presume to stand upon?
#keepers #hiberlink
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from
Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253
Scholarly Articles increasingly link to Wild Web
Resources not just back to other Articles
Dark solid lines
represents URIs
to Web-at-large,
from 1997/2011
* * Artefact of XML supplied

‘book-length work’
‘Gov Docs’
The Scholarly Record always had a fuzzy edge
conference proceedings
‘data as findings’
+ access to the resources needed for Scholarship
… including what is cited on the World Wild Web
e-methods: software
Unintended Consequences of The Web/Internet:
Digital back copy no longer in the custody of libraries
Picture credit:
Unintended Consequences of The Web/Internet:
Digital back copy no longer in the custody of libraries
Picture credit:
Libraries boast of ‘e-collections’,
but do they only have ‘e-connections’?
• Web-scale not-for-profit archiving agencies:
• National institutions (usually national libraries) …
• Consortia of university libraries & specialist centres …
Good News: We do have some digital shelving 
We now have means to discover who is looking
after what e-serials, via the Keepers Registry
"Tales from the Keepers Registry” Serials Review 39.1 (2013)

… to discover who is looking after what
ISSN-L as kernel field
Global Monitor
Keepers Registry gives
evidence on progress
content from 30,769 titles
being ingested with
archival intent by > 1
Key Statistics using Titles Ingested / Titles with ISSN
Ingest Ratio = titles ingested by one or more Keeper
/ total ‘online serials’ in ISSN Register
= 30,769 / 177,631
=> 17%
KeepSafe Ratio = with 3+ Keepers / ISSN Register
= 11,312 / 177,631
=> 6%
Total number of Online Serials in ISSN Register has been increasing: 177,631 in January 2016
T&F, OUP, etc
Wiley etc
Mostly Big Publisher content that’s being archived
Big Variation in Archival Status of Online Continuing
Resources (assigned ISSN) by Country, July 2015
very many ‘at risk’ e-journals from many (small &
not so small) publishers
act early but
find economic way to
archive content from
Standing on the Digits of Giants?
- Not If we don’t keep the digits …

① Go to the Keepers Registry =>
 Search on Title/ISSN
• Check key volumes & issues are being archived
 Browse by publisher
 Use the Title List Comparison tool [Member Services]
• Are your Titles being archived?
 Consider the Linking Options to display ‘archival status’
for each Title on your website
So, First Good Thing To Do – today/now
“when links to web resources no longer point
to what was intended”
There is Threat to Integrity of our Scholarly Record
Reference Rot = Link Rot + Content Drift
Research Report: What recent findings tells us
Funding: Andrew W. Mellon
Link Rot
Link Rot’ is known to be scary
Content Drift may be even scarier!
When what is at end of cited URL has changed, or gone!!
(a) Dynamic content
as values on webpage
changes over time
(b) Static content
but very different (often
unrelated) web pages

Hiberlink analysed 1million URI links to Web-at-large
not links to publisher & access platforms (DOI etc)
Methodology: answer to 2 questions
1. Do those links (URIs) still work? - on the ‘Live Web’’?
2. Is there a ‘Memento’ of that reference in the ‘Archived Web’?
Hiberlink analysed 1million URI links to Web-at-large
not links to publisher & access platforms (DOI etc)
If a Memento cannot be found in a Web Archive within N days of the date of
publication, but URI is still active then risk of loss (& rot)
Methodology: answer to 2 questions
1. Do those links (URIs) still work? - on the ‘Live Web’’?
2. Is there a ‘Memento’ of that reference in the ‘Archived Web’?
If Memento cannot be found in a Web Archive within N days of the date
of publication, and URI not active on the Live Web,
then it is lost / rotten
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014)
Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot.
PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253
Hiberlink Results: within 14 days of publication date …
PMC Elsevier
‘Not Archived’ 74.5% 75.2%
Of those ‘Not Archived’ % %
still ‘Live’ on the Web 80 67.3
‘No longer Live’ on the Web 20% 32.7%
Reference Rot is
already significant
Most referenced
URIs at risk of loss
Team at Harvard Law School establishing similar evidence
“We documented a serious problem of reference rot:
• more than 70% of the URLs within the above mentioned [law] journals, and
• 50% of the URLs within U.S. Supreme Court opinions suffer reference rot
— meaning, again, that they do not produce the information originally cited.”
Jonathan Zittrain, Kendra Albert and Lawrence Lessig (2014).
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations.
Legal Information Management 14. doi:10.1017/S1472669614000255.
=> Content of Citations Rot over Time!!
… leading to rotten references for the reader

Rot in References means a Defective Article!
undermines the integrity of the scholarly record 
Things To Do?
Hint: Remedy for fish is ‘Quick Freeze & Store’
What should the Publisher do?
To give assurance that
the fish / references / articles
sold are not rotten!
with ‘Link Decoration’: JavaScript + Memento API
Demo -
robustlinks.js -
②Help the Reader at the Point of Sale & Use
… Reader is then taken to web content nearest in time
to the submission date of the article

But best done at earliest moment of capture
… for what an Author regards as significant
… or needs to provide as evidence
③ Snapshot & Save: Proactive/Transactional archiving
• Use web-scale archives for on-demand snapshots of URIs:
–; Internet Archive;;
④ Turn a simple URI - to article in New Yorker magazine (say)
into a hiberlink URI
Snapshot URI + Original URI + DateTime [Robust Link syntax]
More Things To Do: ‘Hiberlink Remedy’
<a href=“”

Author workflow: note-taking software
Semi-automatic archiving of referenced web content when noted
eg EndNote, Mendeley, Reference Manager, Zotero, RefME
Zotero plug-in
Software stores those ‘hiberlinks’ for use in citations
1. A parser converts .pdf to .html & extracts URIs
2. Triggers archiving of content for each reference
• Author & Editor need to work together to determine the archival copy to use
3. HTML version that includes Robust Link for each cited reference.
⑥ Avoid reference rot by triggering archiving of
snapshots & inserting Hiberlinks / Robust Links
Submission -> Editing -> (Revision) -> Acceptance -> Issue
OJS plug-in
at the point of Ingest in the submission system
Two Things Publishers should do in Workflow?
⑤ Accept Robust Links in Cited References!
⑦Engage with us in HiberActive Infrastructure *
Web archival service
(e.g. Internet Archive)
to act as middleware between
existing software & web archives
• Asynchronous, returns hiberlink in Robust Link syntax
• Distributed, enables archiving with different web archives
• Lightweight, leverages HTTP & what already exists
* In development
Standing on the Digits of Giants?
- only if published references
have Robust Links to what the author intended
Questions welcome
& any interest in working together

Similar to Ensuring the Integrity (& Continuity) of Our Record of Scholarship (20)

Web Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentWeb Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web content
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentation
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
Preserving Streams of Issued Content
Preserving Streams of Issued ContentPreserving Streams of Issued Content
Preserving Streams of Issued Content
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
New member
New member New member
New member
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
Locah Project Show and Tell
Locah Project Show and TellLocah Project Show and Tell
Locah Project Show and Tell
Who is looking after your e-journals
Who is looking after your e-journalsWho is looking after your e-journals
Who is looking after your e-journals
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?

More from EDINA, University of Edinburgh (20)

The Making of the English Landscape:
The Making of the English Landscape: The Making of the English Landscape:
The Making of the English Landscape:
Spatial Data, Spatial Humanities
Spatial Data, Spatial HumanitiesSpatial Data, Spatial Humanities
Spatial Data, Spatial Humanities
Land Cover Map 2015
Land Cover Map 2015Land Cover Map 2015
Land Cover Map 2015
We have the technology... We have the data... What next?
We have the technology... We have the data... What next?We have the technology... We have the data... What next?
We have the technology... We have the data... What next?
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
GeoForum EDINA report 2017
GeoForum EDINA report 2017GeoForum EDINA report 2017
GeoForum EDINA report 2017
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
Moray housemarch2017
Moray housemarch2017Moray housemarch2017
Moray housemarch2017
Uniof stirlingmarch2017secondary
Uniof stirlingmarch2017secondaryUniof stirlingmarch2017secondary
Uniof stirlingmarch2017secondary
Uniof glasgow jan2017_secondary
Uniof glasgow jan2017_secondaryUniof glasgow jan2017_secondary
Uniof glasgow jan2017_secondary
Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...
Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...
Enhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola OsborneEnhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola OsborneSocial Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola OsborneBest Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
SCURL and SUNCAT serials holdings comparison service
SCURL and SUNCAT serials holdings comparison serviceSCURL and SUNCAT serials holdings comparison service
SCURL and SUNCAT serials holdings comparison service
Big data in Digimap
Big data in DigimapBig data in Digimap
Big data in Digimap
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data services
Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap Update - Geoforum 2016 - Guy McGarva
Digimap Update - Geoforum 2016 - Guy McGarvaDigimap Update - Geoforum 2016 - Guy McGarva
Digimap Update - Geoforum 2016 - Guy McGarva

Ensuring the Integrity (& Continuity) of Our Record of Scholarship

  • 1. Ensuring the Integrity (& Continuity) of Our Record of Scholarship + 7 Good Things To Do ‘Standing on the Digits of Giants: Research data, preservation and innovation’ Funding from the Andrew W. Mellon Foundation Peter Burnhill EDINA, University of Edinburgh ALPSP/DPC, London 8th March 2016
  • 2. Focus on Scholarly Statement = Content + References to Content Content Scholarly References to Content  Back into Scholarly Publications Has ‘fixity’ DOI, ISSN CLOCKSS, Portico, LOCKSS, etc ’ E-Journal Archiving How brittle are those Digits (of Giants) we presume to stand upon? #keepers
  • 3. Focus on Scholarly Statement = Content + References to Content Content Scholarly Record References to Content => Back into Scholarly Publications => Out onto the Web at Large Has ‘fixity’ dynamic , lacks fixity DOI, ISSN CLOCKSS, Portico, LOCKSS, etc ‘Web today, gone tomorrow’ Reference RotE-Journal Archiving How brittle are those Digits (of Giants) we presume to stand upon? #keepers #hiberlink
  • 4. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253 arXiv Elsevier corpus PMC Scholarly Articles increasingly link to Wild Web Resources not just back to other Articles Dark solid lines represents URIs to Web-at-large, from 1997/2011 * * Artefact of XML supplied
  • 5. ‘e-journals’ Websites, Databases, Repositories ‘book-length work’ ‘Gov Docs’ The Scholarly Record always had a fuzzy edge conference proceedings ‘e-magazines’ ‘e-newsmedia’ ‘data as findings’ + access to the resources needed for Scholarship … including what is cited on the World Wild Web e-theses e-methods: software
  • 6. Unintended Consequences of The Web/Internet: Digital back copy no longer in the custody of libraries Picture credit:
  • 7. Unintended Consequences of The Web/Internet: Digital back copy no longer in the custody of libraries Picture credit: Libraries boast of ‘e-collections’, but do they only have ‘e-connections’?
  • 8. • Web-scale not-for-profit archiving agencies: • National institutions (usually national libraries) … • Consortia of university libraries & specialist centres … Good News: We do have some digital shelving  We now have means to discover who is looking after what e-serials, via the Keepers Registry "Tales from the Keepers Registry” Serials Review 39.1 (2013) + +
  • 9. … to discover who is looking after what ISSN-L as kernel field Global Monitor Keepers Registry gives evidence on progress content from 30,769 titles being ingested with archival intent by > 1 keeper
  • 10. Key Statistics using Titles Ingested / Titles with ISSN Ingest Ratio = titles ingested by one or more Keeper / total ‘online serials’ in ISSN Register = 30,769 / 177,631 => 17% KeepSafe Ratio = with 3+ Keepers / ISSN Register = 11,312 / 177,631 => 6% Total number of Online Serials in ISSN Register has been increasing: 177,631 in January 2016
  • 11. Elsevier Hindawi T&F, OUP, etc Wiley etc Springer Karger Mostly Big Publisher content that’s being archived Big Variation in Archival Status of Online Continuing Resources (assigned ISSN) by Country, July 2015
  • 12. very many ‘at risk’ e-journals from many (small & not so small) publishers BIG publishers act early but incompletely Priority: find economic way to archive content from Standing on the Digits of Giants? - Not If we don’t keep the digits …
  • 13. ① Go to the Keepers Registry =>  Search on Title/ISSN • Check key volumes & issues are being archived  Browse by publisher  Use the Title List Comparison tool [Member Services] • Are your Titles being archived?  Consider the Linking Options to display ‘archival status’ for each Title on your website So, First Good Thing To Do – today/now
  • 14. “when links to web resources no longer point to what was intended” There is Threat to Integrity of our Scholarly Record Reference Rot = Link Rot + Content Drift Research Report: What recent findings tells us Funding: Andrew W. Mellon Foundation
  • 15. Link Rot Link Rot’ is known to be scary
  • 16. Content Drift may be even scarier! When what is at end of cited URL has changed, or gone!! 2000 2004 2005 2008 (a) Dynamic content as values on webpage changes over time (b) Static content but very different (often unrelated) web pages
  • 17. Hiberlink analysed 1million URI links to Web-at-large not links to publisher & access platforms (DOI etc) Methodology: answer to 2 questions 1. Do those links (URIs) still work? - on the ‘Live Web’’? 2. Is there a ‘Memento’ of that reference in the ‘Archived Web’?
  • 18. Hiberlink analysed 1million URI links to Web-at-large not links to publisher & access platforms (DOI etc) If a Memento cannot be found in a Web Archive within N days of the date of publication, but URI is still active then risk of loss (& rot) Methodology: answer to 2 questions 1. Do those links (URIs) still work? - on the ‘Live Web’’? 2. Is there a ‘Memento’ of that reference in the ‘Archived Web’? If Memento cannot be found in a Web Archive within N days of the date of publication, and URI not active on the Live Web, then it is lost / rotten
  • 19. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253 Hiberlink Results: within 14 days of publication date … PMC Elsevier ‘Not Archived’ 74.5% 75.2% Of those ‘Not Archived’ % % still ‘Live’ on the Web 80 67.3 ‘No longer Live’ on the Web 20% 32.7% Reference Rot is already significant Most referenced URIs at risk of loss Team at Harvard Law School establishing similar evidence “We documented a serious problem of reference rot: • more than 70% of the URLs within the above mentioned [law] journals, and • 50% of the URLs within U.S. Supreme Court opinions suffer reference rot — meaning, again, that they do not produce the information originally cited.” Jonathan Zittrain, Kendra Albert and Lawrence Lessig (2014). Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations. Legal Information Management 14. doi:10.1017/S1472669614000255.
  • 20. => Content of Citations Rot over Time!! … leading to rotten references for the reader
  • 21. Rot in References means a Defective Article! undermines the integrity of the scholarly record 
  • 22. Things To Do? Hint: Remedy for fish is ‘Quick Freeze & Store’
  • 23. What should the Publisher do? To give assurance that the fish / references / articles sold are not rotten!
  • 24. with ‘Link Decoration’: JavaScript + Memento API Demo - robustlinks.js - ②Help the Reader at the Point of Sale & Use … Reader is then taken to web content nearest in time to the submission date of the article
  • 25. But best done at earliest moment of capture
  • 26. … for what an Author regards as significant
  • 27. … or needs to provide as evidence
  • 28. ③ Snapshot & Save: Proactive/Transactional archiving • Use web-scale archives for on-demand snapshots of URIs: –; Internet Archive;; ④ Turn a simple URI - to article in New Yorker magazine (say) into a hiberlink URI Snapshot URI + Original URI + DateTime [Robust Link syntax] More Things To Do: ‘Hiberlink Remedy’ <a href=“” data- versionurl=“” data-versiondate=“2015-02-19T09:46:36”>Cobweb</a>
  • 29. Author workflow: note-taking software Semi-automatic archiving of referenced web content when noted eg EndNote, Mendeley, Reference Manager, Zotero, RefME Zotero plug-in Software stores those ‘hiberlinks’ for use in citations
  • 30. 1. A parser converts .pdf to .html & extracts URIs 2. Triggers archiving of content for each reference • Author & Editor need to work together to determine the archival copy to use 3. HTML version that includes Robust Link for each cited reference. ⑥ Avoid reference rot by triggering archiving of snapshots & inserting Hiberlinks / Robust Links Submission -> Editing -> (Revision) -> Acceptance -> Issue OJS plug-in at the point of Ingest in the submission system Two Things Publishers should do in Workflow? ⑤ Accept Robust Links in Cited References!
  • 31. ⑦Engage with us in HiberActive Infrastructure * Workflow software HiberActive Web archival service (e.g. Internet Archive) to act as middleware between existing software & web archives • Asynchronous, returns hiberlink in Robust Link syntax • Distributed, enables archiving with different web archives • Lightweight, leverages HTTP & what already exists * In development
  • 32. Standing on the Digits of Giants? - only if published references have Robust Links to what the author intended Questions welcome & any interest in working together