SlideShare a Scribd company logo
Preserving	
  The	
  Integrity	
  of	
  The	
  Scholarly	
  Record	
  
http://www.flickr.com/photos/shinez/5000985919/
Peter	
  Burnhill,	
  	
  EDINA	
  
@	
  University	
  of	
  Edinburgh	
  
NaAonal	
  Library	
  of	
  Scotland	
  	
  
George	
  IV	
  Bridge	
  	
  
5.30pm	
  16th	
  February	
  	
  
Preserving	
  The	
  Integrity	
  of	
  The	
  Scholarly	
  Record	
  
http://www.flickr.com/photos/shinez/5000985919/
Peter	
  Burnhill,	
  	
  EDINA	
  
@	
  University	
  of	
  Edinburgh	
  
NaAonal	
  Library	
  of	
  Scotland	
  	
  
George	
  IV	
  Bridge	
  	
  
5.30pm	
  16th	
  February	
  	
  
Take	
  Home	
  Message:	
  
1)  Archive	
  Streams	
  of	
  Issued	
  Content	
  
2)  Avoid	
  Reference	
  Rot	
  
	
  	
  
	
  
The	
  Scholarly	
  Record	
  &	
  Serials	
  …	
  [a	
  focus	
  on	
  the	
  digital]	
  	
  	
  
‘The	
  Scholarly	
  	
  
Record’	
  has	
  a	
  	
  
fuzzy	
  edge	
  
‘e-­‐journals’	
  
Websites,	
  	
  
Databases,	
  	
  
Repositories	
  
‘Book-­‐length	
  work’	
  
The	
  Scholarly	
  Record	
  &	
  Serials	
  …	
  [a	
  focus	
  on	
  the	
  digital]	
  	
  	
  
ConAnuing	
  	
  
Resources,	
  	
  
inc.	
  Serials	
  	
  
‘The	
  Scholarly	
  	
  
Record’	
  has	
  a	
  	
  
fuzzy	
  edge	
  
‘e-­‐journals’	
  
Websites,	
  	
  
Databases,	
  	
  
Repositories	
  
‘Book-­‐length	
  work’	
  
The	
  Scholarly	
  Record	
  &	
  Serials	
  …	
  [a	
  focus	
  on	
  the	
  digital]	
  	
  	
  
ConAnuing	
  	
  
Resources,	
  	
  
inc.	
  Serials	
  	
  
‘The	
  Scholarly	
  	
  
Record’	
  has	
  a	
  	
  
fuzzy	
  edge	
  
Issued	
  in	
  Parts	
  	
  
(Serials)	
  
Content	
  changes	
  	
  
over	
  Ame	
  	
  
(IntegraAng)	
  
‘e-­‐journals’	
  
Websites,	
  	
  
Databases,	
  	
  
Repositories	
  
‘Book-­‐length	
  work’	
  
The	
  Scholarly	
  Record	
  &	
  Serials	
  …	
  [a	
  focus	
  on	
  the	
  digital]	
  	
  	
  
ConAnuing	
  	
  
Resources,	
  	
  
inc.	
  Serials	
  	
  
‘The	
  Scholarly	
  	
  
Record’	
  has	
  a	
  	
  
fuzzy	
  edge	
  
Other	
  ‘resources	
  	
  
needed	
  	
  
for	
  scholarship’	
  
Issued	
  in	
  Parts	
  	
  
(Serials)	
  
Content	
  changes	
  	
  
over	
  Ame	
  	
  
(IntegraAng)	
  
‘e-­‐journals’	
  
Websites,	
  	
  
Databases,	
  	
  
Repositories	
  
‘Book-­‐length	
  work’	
  
‘Gov	
  Docs’	
  
1.  What	
  exactly	
  is	
  the	
  scholarly	
  record?	
  
•  What	
  of	
  that	
  now	
  ‘issued	
  on	
  the	
  Web’?	
  
•  And	
  what	
  if	
  we	
  limit	
  focus	
  to	
  what	
  could	
  get	
  an	
  ISSN?	
  
2.  Whose	
  responsibility	
  is	
  it	
  to	
  act	
  as	
  steward?	
  	
  
Each	
  research	
  library;	
  library	
  consorAa;	
  	
  
naAonal/state	
  libraries/archives?	
  
&	
  is	
  this	
  a	
  naAonal,	
  or	
  a	
  trans-­‐naAonal	
  challenge?	
  
	
  
The	
  following	
  quesAons	
  are	
  implicit:	
  
An Article, once available in print
on-shelf locally …
… is now online & accessed
remotely,
‘anytime/anywhere’
=> Improved Ease of Access J
But what of Continuity of Access?
Will it be still be there tomorrow?
	
  
Libraries boast of ‘e-collections’,
but maybe now they only have ‘e-connections’
Picture	
  credit:	
  hgp://somanybooksblog.com/2009/03/27/library-­‐tour/	
  
=> real & present danger for the integrity
of what is published as scholarly record
This is a global challenge: trans-national action
%age of 132,806 ISSN issued for e-serials (December 2013)
US:	
  20%	
  UK:	
  8.6%	
  
Rest	
  of	
  World:	
  	
  
71%	
  
Researchers (& libraries/publishers) in any one country
are dependent upon content written and published as
serials in countries other than their own
So, who is offering digital shelving?
①  Web-scale not-for-profit archiving agencies:
②  National libraries …
③  Research libraries: consortia & specialist centres …
Ingesting content with archival intent …
National Science Library,
Chinese Academy of Sciences
National Science Library,
Chinese Academy of Sciences
Many archiving organisations a Good Thing
“Digital information is best preserved by replicating it at multiple
archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)
Some	
  bad	
  stuff	
  will	
  happen!	
  
A	
  Project	
  to	
  
	
  
Pilot	
  an	
  	
  
E-­‐journal	
  	
  
PreservaAon	
  	
  
Registry	
  	
  
Service	
  
Need to know who is looking after what & how?	
  	
  
ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-­‐L	
  as	
  kernel	
  field	
  
METADATA
on extant e-serials
METADATA	
  	
  
on preservation action
Digital Preservation
Agencies
Pilot: CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance
A	
  Project	
  to	
  
	
  
Pilot	
  an	
  	
  
E-­‐journal	
  	
  
PreservaAon	
  	
  
Registry	
  	
  
Service	
  
Need to know who is looking after what & how?	
  	
  
ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-­‐L	
  as	
  kernel	
  field	
  
METADATA
on extant e-serials
METADATA	
  	
  
on preservation action
Digital Preservation
Agencies
Pilot: CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance
A	
  Project	
  to	
  
	
  
Pilot	
  an	
  	
  
E-­‐journal	
  	
  
PreservaAon	
  	
  
Registry	
  	
  
Service	
  
Need to know who is looking after what & how?	
  	
  
The Keepers Registry
"Tales	
  from	
  the	
  	
  
Keepers	
  Registry"	
  	
  
Serials	
  Review	
  39.1	
  (2013)	
  
…	
  to	
  discover	
  who	
  is	
  looking	
  a5er	
  what	
  
thekeepers.org as Global Monitor
*New	
  in	
  2014*	
  	
  
	
  
Library	
  of	
  Congress	
  	
  
and	
  Scholars	
  Portal	
  	
  
now	
  reporAng	
  in	
  
	
  
e-­‐journals	
  should	
  be	
  easy	
  	
  –	
  right?	
  	
  
the	
  Keepers	
  Registry	
  recorded	
  	
  
In	
  2011,	
  16,558	
  Atles	
  ‘ingested	
  &	
  
archived’	
  by	
  at	
  least	
  1	
  ‘keeper’	
  	
  
	
  in	
  2013,	
  21,557	
  
	
  	
  	
  	
  in	
  2014,	
  26,195	
  now	
  26,712	
  
	
  
	
  
9,731	
  'ingested	
  &	
  archived'	
  by	
  3+	
  
…	
  more	
  archiving	
  &	
  as	
  more	
  archives	
  report	
  into	
  Registry	
  !	
  	
  
Some	
  signs	
  of	
  Progress:	
  
Wrigen	
  &	
  produced	
  by	
  Julie	
  Brown,	
  1989	
  
“Are we there yet?” … “Don’t think so”
‘Ingest Ratio’= titles being ingested by one or more Keeper
/ ‘online serials’ in ISSN Register
= 26,195 / 136,965 [in March 2014]
=> 19%
(We do not know about 80% of all resources having ISSN)
‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers
/ ‘online serials’ in ISSN Register
= 9,656 / 136,965
=> 7%
Evidence	
  on	
  what	
  libraries	
  care	
  about	
  
Using	
  Title	
  List	
  Comparison	
  tool	
  in	
  Members	
  Area	
  of	
  Keepers	
  Registry	
  
As	
  reported	
  in:	
  	
  P.	
  Burnhill	
  (2013)	
  Tales	
  from	
  The	
  Keepers	
  Registry:	
  Serial	
  Issues	
  About	
  Archiving	
  &	
  the	
  
Web.	
  Serials	
  Review	
  39	
  (1),	
  3–20.	
  hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,	
  &
hgps://www.era.lib.ed.ac.uk/handle/1842/6682	
  
	
  
In	
  2011/12	
  three	
  major	
  research	
  libraries	
  in	
  the	
  USA	
  	
  
(Columbia,	
  Cornell	
  &	
  Duke)	
  	
  
checked	
  archival	
  status	
  of	
  serial	
  Atles	
  regarded	
  as	
  important	
  	
  
	
  
‘Ingest	
  RaKo’	
  =	
  22%	
  to	
  28%,	
  ie	
  about	
  a	
  quarter	
  	
  
	
  
	
  
=>	
  fate	
  of	
  c.75%	
  is	
  unknown	
  
very	
  many	
  ‘at	
  risk’	
  e-­‐journals	
  from	
  many	
  small	
  publishers	
  
BIG	
  	
  
publishers	
  	
  
act	
  early	
  but	
  
incompletely	
  
Priority:	
  	
  
find	
  economic	
  way	
  to	
  
archive	
  content	
  from	
  …	
  
…	
  logs	
  for	
  the	
  UK	
  OpenURL	
  Router*	
  
•  8.5m	
  full	
  text	
  requests	
  in	
  UK	
  during	
  2012	
  	
  
=>	
  53,311	
  online	
  Atles	
  requested	
  	
  
	
  Analysis	
  in	
  2013::	
  
	
  
	
  ‘Ingest	
  RaKo’	
  =	
  32%	
  (16,985/53,311)	
   	
  	
  
	
  
	
  =>	
  over	
  two	
  thirds	
  68%	
  (36,326	
  Atles)	
  held	
  by	
  none!	
  
	
  
	
  
	
  	
  
	
  
Evidence	
  based	
  on	
  what	
  Researchers	
  Use	
  
*	
  As	
  reported	
  in	
  Keepers	
  Registry	
  Blog,	
  OpenURL	
  Router	
  passes	
  ‘discovery’	
  requests	
  to	
  commercial	
  OpenURL	
  
resolver	
  services;	
  developed	
  &	
  delivered	
  by	
  EDINA	
  as	
  part	
  of	
  Jisc	
  support	
  for	
  UK	
  universiAes	
  &	
  colleges	
  	
  
…	
  logs	
  for	
  the	
  UK	
  OpenURL	
  Router*	
  
•  8.5m	
  full	
  text	
  requests	
  in	
  UK	
  during	
  2012	
  	
  
=>	
  53,311	
  online	
  Atles	
  requested	
  	
  
	
  Analysis	
  in	
  2013::	
  
	
  
	
  ‘Ingest	
  RaKo’	
  =	
  32%	
  (16,985/53,311)	
   	
  	
  
	
  
	
  =>	
  over	
  two	
  thirds	
  68%	
  (36,326	
  Atles)	
  held	
  by	
  none!	
  
	
  
	
  
	
  	
  
	
  
Evidence	
  based	
  on	
  what	
  Researchers	
  Use	
  
*	
  As	
  reported	
  in	
  Keepers	
  Registry	
  Blog,	
  OpenURL	
  Router	
  passes	
  ‘discovery’	
  requests	
  to	
  commercial	
  OpenURL	
  
resolver	
  services;	
  developed	
  &	
  delivered	
  by	
  EDINA	
  as	
  part	
  of	
  Jisc	
  support	
  for	
  UK	
  universiAes	
  &	
  colleges	
  	
  
“I	
  believe	
  we've	
  …	
  a	
  problem	
  here.”	
  [John	
  Swigert,	
  Jr.]	
  	
  
Another threat to the integrity of the record	
  
Language Technology Group	
  
Funded by the Andrew W. Mellon Foundation
‘Reference	
  Rot’	
  	
  
When	
  what	
  was	
  referenced	
  &	
  cited	
  	
  
ceases	
  to	
  say	
  the	
  same	
  thing,	
  or	
  ‘has	
  ceased	
  to	
  be’	
  
hJp://www.snorgtees.com/this-­‐parrot-­‐has-­‐ceased-­‐to-­‐be
Reference Rot = Link Rot + Content Drift
“when links to web resources
no longer point to what they once did”
Link Rot
‘Link Rot’	
  
+ Content Drift: What is at end of URI has changed, or gone!
http://dl00.org
2000
http://dl00.org
2004
http://dl00.org
2005
http://dl00.org
2008
(a)	
  Dynamic	
  content	
  
as	
  values	
  on	
  webpage	
  
changes	
  over	
  Ame	
  
(b)	
  StaKc	
  content	
  
but	
  very	
  different	
  (o{en	
  
unrelated)	
  web	
  pages	
  
Hiberlink: Time Travel for The Scholarly Web
1.  Threat: Creating evidence on extent of ‘Reference Rot’
–  Main focus: references (& URIs) made in Journal Articles
•  "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"
–  PLOS One paper published on 26 December 2014.
•  Harvard Law Library & permaCC reference rot in Supreme Court judgments
•  http://www.newyorker.com/magazine/2015/01/26/cobweb
–  Also looked at Reference Rot & the e-Thesis, ETD2014
2.  Remedy: Opportunities for productive intervention
–  Identify workflows: preparation, publication, ingest
–  Prototype tools to avoid or limit reference rot
–  Pro-active or ‘transactional’ archiving as remedy
•  Embedding such ‘solutions’ in existing tools & infrastructure
•  Propose/test new infrastructure for temporal referencing
–  supporting & using the Memento protocol
Peter Burnhill,
EDINAhgp://www.res|ulliving.com/wp-­‐content/uploads/2013/12/Time-­‐1024x861.jpg	
  
Preserving	
  the	
  integrity	
  
of	
  the	
  scholarly	
  	
  
record	
  
•  Robust Link - re-factor the HTML link that is returned
‘Infrastructure’ to Enable Remedy
<a href="http://www.bnf.fr">
Link to the BNF
</a>
b)  Augment Link with a set of Datetime & location pairs
<a href="http://www.bnf.fr"
mset="2014-05-19,
http://archive.today/zdpAn 2014-05-15 memento">
Link to the BNF
</a>
a)  Take simple URI - to French National Library (say)	
  
hgp://robustlinks.mementoweb.org/	
  
Remedy for The Integrity of The Scholarly Record
Envisage	
  the	
  best	
  opportuniAes	
  for	
  IntervenAon	
  to	
  make	
  
Remedy,	
  to	
  ‘flash-­‐freeze’,	
  either	
  to	
  avoid	
  reference	
  rot	
  or	
  to	
  ‘stop	
  
the	
  rot’.	
  
	
  
3	
  basic	
  workflows:	
  
① Study:	
  PreparaAon	
  -­‐>	
  (Review)	
  -­‐>	
  Submission	
  	
  
② PublicaAon:	
  Editorial	
  -­‐>	
  (Revision)	
  -­‐>	
  Acceptance	
  -­‐>	
  Issue	
  	
  	
  
③ Post-­‐PublicaAon:	
  Deposit/Ingest	
  -­‐>	
  Provide/Access	
  -­‐>	
  Use	
  
	
  
	
  
	
  
	
  
	
  	
  	
  
IdenPfy	
  the	
  Actors	
  involved	
  in:	
  
① ComposiAon:	
  author/creator	
  
② Public	
  Release:	
  editor/referee/copy	
  	
  
③ CuraAon:	
  librarian	
  /	
  repository	
  manager	
  /	
  archivist	
  
	
  
Hiberlink Plug-in: help authors & middle-folk do the right thing:
①  Triggers archiving of referenced web content when it
is noted in:
–  Zotero - used by authors to manage references
https://www.zotero.org/
–  Open Journal System (OJS) - used by OA publishers
https://pkp.sfu.ca/ojs/
②  Returns Datetime URI for archived content that can
be used in the citation
Two-step Remedy To Avoid Reference Rot
Time’s Up!
thekeepers.org
hiberlink.org
•  See also
•  thekeepers.blogs.edina.ac.uk
•  safenet.blogs.edina.ac.uk/
HelpDesk: edina@ed.ac.uk

More Related Content

Preserving the Integrity of the Scholarly Record

  • 1. Preserving  The  Integrity  of  The  Scholarly  Record   http://www.flickr.com/photos/shinez/5000985919/ Peter  Burnhill,    EDINA   @  University  of  Edinburgh   NaAonal  Library  of  Scotland     George  IV  Bridge     5.30pm  16th  February    
  • 2. Preserving  The  Integrity  of  The  Scholarly  Record   http://www.flickr.com/photos/shinez/5000985919/ Peter  Burnhill,    EDINA   @  University  of  Edinburgh   NaAonal  Library  of  Scotland     George  IV  Bridge     5.30pm  16th  February     Take  Home  Message:   1)  Archive  Streams  of  Issued  Content   2)  Avoid  Reference  Rot        
  • 3. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ‘The  Scholarly     Record’  has  a     fuzzy  edge   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’  
  • 4. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’  
  • 5. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   Issued  in  Parts     (Serials)   Content  changes     over  Ame     (IntegraAng)   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’  
  • 6. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   Other  ‘resources     needed     for  scholarship’   Issued  in  Parts     (Serials)   Content  changes     over  Ame     (IntegraAng)   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’   ‘Gov  Docs’  
  • 7. 1.  What  exactly  is  the  scholarly  record?   •  What  of  that  now  ‘issued  on  the  Web’?   •  And  what  if  we  limit  focus  to  what  could  get  an  ISSN?   2.  Whose  responsibility  is  it  to  act  as  steward?     Each  research  library;  library  consorAa;     naAonal/state  libraries/archives?   &  is  this  a  naAonal,  or  a  trans-­‐naAonal  challenge?     The  following  quesAons  are  implicit:  
  • 8. An Article, once available in print on-shelf locally … … is now online & accessed remotely, ‘anytime/anywhere’ => Improved Ease of Access J But what of Continuity of Access? Will it be still be there tomorrow?  
  • 9. Libraries boast of ‘e-collections’, but maybe now they only have ‘e-connections’ Picture  credit:  hgp://somanybooksblog.com/2009/03/27/library-­‐tour/   => real & present danger for the integrity of what is published as scholarly record
  • 10. This is a global challenge: trans-national action %age of 132,806 ISSN issued for e-serials (December 2013) US:  20%  UK:  8.6%   Rest  of  World:     71%   Researchers (& libraries/publishers) in any one country are dependent upon content written and published as serials in countries other than their own
  • 11. So, who is offering digital shelving? ①  Web-scale not-for-profit archiving agencies: ②  National libraries … ③  Research libraries: consortia & specialist centres … Ingesting content with archival intent … National Science Library, Chinese Academy of Sciences National Science Library, Chinese Academy of Sciences
  • 12. Many archiving organisations a Good Thing “Digital information is best preserved by replicating it at multiple archives run by autonomous organizations” B. Cooper and H. Garcia-Molina (2002) Some  bad  stuff  will  happen!  
  • 13. A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?    
  • 14. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry user requirements (a) (b) ISSN-­‐L  as  kernel  field   METADATA on extant e-serials METADATA     on preservation action Digital Preservation Agencies Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?    
  • 15. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry user requirements (a) (b) ISSN-­‐L  as  kernel  field   METADATA on extant e-serials METADATA     on preservation action Digital Preservation Agencies Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?     The Keepers Registry "Tales  from  the     Keepers  Registry"     Serials  Review  39.1  (2013)  
  • 16. …  to  discover  who  is  looking  a5er  what   thekeepers.org as Global Monitor *New  in  2014*       Library  of  Congress     and  Scholars  Portal     now  reporAng  in    
  • 17. e-­‐journals  should  be  easy    –  right?     the  Keepers  Registry  recorded     In  2011,  16,558  Atles  ‘ingested  &   archived’  by  at  least  1  ‘keeper’      in  2013,  21,557          in  2014,  26,195  now  26,712       9,731  'ingested  &  archived'  by  3+   …  more  archiving  &  as  more  archives  report  into  Registry  !     Some  signs  of  Progress:   Wrigen  &  produced  by  Julie  Brown,  1989  
  • 18. “Are we there yet?” … “Don’t think so” ‘Ingest Ratio’= titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register = 26,195 / 136,965 [in March 2014] => 19% (We do not know about 80% of all resources having ISSN) ‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register = 9,656 / 136,965 => 7%
  • 19. Evidence  on  what  libraries  care  about   Using  Title  List  Comparison  tool  in  Members  Area  of  Keepers  Registry   As  reported  in:    P.  Burnhill  (2013)  Tales  from  The  Keepers  Registry:  Serial  Issues  About  Archiving  &  the   Web.  Serials  Review  39  (1),  3–20.  hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,  & hgps://www.era.lib.ed.ac.uk/handle/1842/6682     In  2011/12  three  major  research  libraries  in  the  USA     (Columbia,  Cornell  &  Duke)     checked  archival  status  of  serial  Atles  regarded  as  important       ‘Ingest  RaKo’  =  22%  to  28%,  ie  about  a  quarter         =>  fate  of  c.75%  is  unknown  
  • 20. very  many  ‘at  risk’  e-­‐journals  from  many  small  publishers   BIG     publishers     act  early  but   incompletely   Priority:     find  economic  way  to   archive  content  from  …  
  • 21. …  logs  for  the  UK  OpenURL  Router*   •  8.5m  full  text  requests  in  UK  during  2012     =>  53,311  online  Atles  requested      Analysis  in  2013::      ‘Ingest  RaKo’  =  32%  (16,985/53,311)          =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!             Evidence  based  on  what  Researchers  Use   *  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL   resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges    
  • 22. …  logs  for  the  UK  OpenURL  Router*   •  8.5m  full  text  requests  in  UK  during  2012     =>  53,311  online  Atles  requested      Analysis  in  2013::      ‘Ingest  RaKo’  =  32%  (16,985/53,311)          =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!             Evidence  based  on  what  Researchers  Use   *  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL   resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges     “I  believe  we've  …  a  problem  here.”  [John  Swigert,  Jr.]    
  • 23. Another threat to the integrity of the record   Language Technology Group   Funded by the Andrew W. Mellon Foundation ‘Reference  Rot’     When  what  was  referenced  &  cited     ceases  to  say  the  same  thing,  or  ‘has  ceased  to  be’   hJp://www.snorgtees.com/this-­‐parrot-­‐has-­‐ceased-­‐to-­‐be Reference Rot = Link Rot + Content Drift “when links to web resources no longer point to what they once did”
  • 25. + Content Drift: What is at end of URI has changed, or gone! http://dl00.org 2000 http://dl00.org 2004 http://dl00.org 2005 http://dl00.org 2008 (a)  Dynamic  content   as  values  on  webpage   changes  over  Ame   (b)  StaKc  content   but  very  different  (o{en   unrelated)  web  pages  
  • 26. Hiberlink: Time Travel for The Scholarly Web 1.  Threat: Creating evidence on extent of ‘Reference Rot’ –  Main focus: references (& URIs) made in Journal Articles •  "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot" –  PLOS One paper published on 26 December 2014. •  Harvard Law Library & permaCC reference rot in Supreme Court judgments •  http://www.newyorker.com/magazine/2015/01/26/cobweb –  Also looked at Reference Rot & the e-Thesis, ETD2014 2.  Remedy: Opportunities for productive intervention –  Identify workflows: preparation, publication, ingest –  Prototype tools to avoid or limit reference rot –  Pro-active or ‘transactional’ archiving as remedy •  Embedding such ‘solutions’ in existing tools & infrastructure •  Propose/test new infrastructure for temporal referencing –  supporting & using the Memento protocol
  • 28. •  Robust Link - re-factor the HTML link that is returned ‘Infrastructure’ to Enable Remedy <a href="http://www.bnf.fr"> Link to the BNF </a> b)  Augment Link with a set of Datetime & location pairs <a href="http://www.bnf.fr" mset="2014-05-19, http://archive.today/zdpAn 2014-05-15 memento"> Link to the BNF </a> a)  Take simple URI - to French National Library (say)   hgp://robustlinks.mementoweb.org/  
  • 29. Remedy for The Integrity of The Scholarly Record Envisage  the  best  opportuniAes  for  IntervenAon  to  make   Remedy,  to  ‘flash-­‐freeze’,  either  to  avoid  reference  rot  or  to  ‘stop   the  rot’.     3  basic  workflows:   ① Study:  PreparaAon  -­‐>  (Review)  -­‐>  Submission     ② PublicaAon:  Editorial  -­‐>  (Revision)  -­‐>  Acceptance  -­‐>  Issue       ③ Post-­‐PublicaAon:  Deposit/Ingest  -­‐>  Provide/Access  -­‐>  Use                 IdenPfy  the  Actors  involved  in:   ① ComposiAon:  author/creator   ② Public  Release:  editor/referee/copy     ③ CuraAon:  librarian  /  repository  manager  /  archivist    
  • 30. Hiberlink Plug-in: help authors & middle-folk do the right thing: ①  Triggers archiving of referenced web content when it is noted in: –  Zotero - used by authors to manage references https://www.zotero.org/ –  Open Journal System (OJS) - used by OA publishers https://pkp.sfu.ca/ojs/ ②  Returns Datetime URI for archived content that can be used in the citation Two-step Remedy To Avoid Reference Rot
  • 31. Time’s Up! thekeepers.org hiberlink.org •  See also •  thekeepers.blogs.edina.ac.uk •  safenet.blogs.edina.ac.uk/ HelpDesk: edina@ed.ac.uk