SlideShare a Scribd company logo
  Open Repositories and Interoperability Challenges in UK  Peter Burnhill Director, EDINA National Data Centre,  University of Edinburgh, Scotland UK DL.org  Workshop, The British Academy, London UK, 4   February 2011 Digital Libraries and Open Access: Interoperability Strategies
+ expertise, in areas such as geo-enabling, access management, etc
UK funding councils   JISC Sub-Committees JISC Collections acting as platform for network-level services  &   helping to build the JISC Integrated Information Environment  research, learning & teaching in UK  universities & colleges Research  Councils UK National Data Centres
Began as a data manufacturer Scottish Education Data Archive,  late 1970s – mid ‘80s Survey statistician: for school leaver, YTS & 16-19 cohort surveys  curated as databases : the  working capital  for research group + ‘guests’ Became a data broker Edinburgh University Data Library,  mid- 1980s & on Providing  library and ease of access to data held elsewhere Connected to  IASSIST , international group for data librarians/archivists Learning about interesting spaces [time/place referencing] ESRC Regional Research Laboratory  for  Scotland  1986/90 Co-director: early days of Geographical Information Systems (GIS) Inter-Agency Committee on Global Environmental Change Data Task Force Moved into national data services; learnt more about data curation EDINA national data centre,  mid-1990s to present Director: set-up and continuing development of access/deposit services Digital Curation Centre,  2004 & 2005  Director: set-up & digital curation = ‘data curation’ & ‘digital preservation’ autobiography as commentary on data facilities Spot the repository!
Researchers’ viewpoint: a cultural shift? Preserve or Perish “ You are not finished until you have done the research, published the results,  and  published the data, receiving formal credit for everything.” Mark A. Parsons (2006) International Polar Year “ A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” in  Advice to a Young Investigator  (1897) Santiago Ramón y Cajal  (Nobel Prize winner, 1906)
document tradition & computation tradition Emergence of Digital Library: Information Science “ considerable simplification, … helpful to think … of two traditions, or mentalities, even cultures, co-exist in area of Information Science “ Approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like  “ approaches based on uses for formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical (as in algorithmic procedures).” Michael Buckland  ( UC Berkeley),  Presidential Address, American Society for Information Science, JASIS’s 50th  (1998) http://people.ischool.berkeley.edu/~buckland/asis62.html
Semantics of  ‘Open Repositories & Interoperability’ R is for Repository  ” university-based institutional repository is a set of services …  for the management and dissemination of digital materials created by the institution & its community members. … organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access ...” (C. Lynch, 2003 )  Digital repository differs from other digital collections in that:  "content is deposited, whether by content creator, owner or third party  "architecture manages content as well as metadata  "repository offers a minimum set of basic services "must be sustainable & trusted, well-supported & well-managed” Digital Repositories Review (R.Heery and S.Anderson, 2005)  O is for Open OA (for publications) not the only ‘open’ policy: OER: Open Educational Resources Open means ‘not closed’: making teaching & learning materials visible Open CourseWare – often as open stack of webpages Open Data Datasets tradition (IASSIST); ‘open/privilege access to databases; open data.gov Open Source Software OSS has its own way of doing things
‘ Open Repositories & Interoperability’ [in the UK]  Heretical and Haphazard Thinking About The Brief … Are Repositories the (only) way to support an Open Agenda? and Is Open really what Repositories are for? Or Is this usage just intended to help us avoid issues of IP and access management? And should the focus be on: Interoperability between Repositories? or Interoperability of Repositories with the wider Internet?
Interoperability Strategies; Interoperability Challenges  Whose strategy, and towards what purpose? ‘ within & for the research & education sector’? Or beyond?  for the institution, the UK, EU, global anybody; for the researcher?  for the machine as user  [“Provider/Consumer”] Interoperability as technical [& semantic] means to support interworking by persons or systems has challenges policy/technology/infrastructure/management/metadata  Internet engineering & semantic web  Repositories and Linked Data Beyond PUT, KEEP and GET  of the singular Repository Connecting Repositories with  NOTIFY & EXCHANGE; TRANSFER of objects or metadata only So that the content of the object (or its metadata) may be re-used Linking & Identifiers Really Matter! and Registries have a key role as authorities and cross-walks
Maybe can we agree our shared & central task… to ensure ease & continuity of access to  (online/digital)  scholarly resources for researchers, students and their teachers,  now and into the future  My perspective In a University-based organisation (EDINA) that is a provider of content services & infrastructure services  within national (UK) policy framework number of content services based on use of repository software  Eprints, DSpace, IntraLibrary number of infrastructure activities OpenDepot.org, OA Repository Junction; OpenURL Router member of SONEX and indirectly of COAR and UK-CORR Later focus on on repository-related progress in the UK;  where is the value, how this is  assessed/expressed? Size of investment in recent times; cost-effectiveness and ‘impact’ Effort at institutional & inter/national level and the ‘shared services’ agenda?
Nostalgia for Days of Plenty  as we worry about the future JISC as well-funded agent for change JISC Repositories & Preservation Programme -  April 2006; March 2009 “ £14m investment in H.E. repository and digital content infrastructure” A drive to assist institutions, including JISC RepositoryNet? Repository Support Project; Repository Research Project Intute Repository Search; ‘interim repository’ | the Depot | OpenDepot.org Services/Tools like OpenDOAR; Romeo; SWORD; OARJ Check the JISC website  http://www.jisc.ac.uk/whatwedo/programmes/inf11/sue2.aspx under the heading of ‘key digital repository activities’ are  21 funding programmes and 226 funded projects. & then there were many meetings, including a new ‘regular’ street event: RepoFringe2010: Repository Fringe 2/3 September, Edinburgh
SONEX  Began with focus on ‘deposit opportunities’ categorisation of repository types into which authors deposit Began with research paper use case:  multi-author &  multi-institutional Looking at onward  interoperability  (SWORD) not just technical  interoperability  but workflow
Day Job: as provider of services/tools & user of software EDINA and Edinburgh University Data Library run repositories,  with and without JISC Jorum : for learning materials [with Mimas]  OER and turnstile (UK); using DSpace  OpenDepot  (the Depot): for research papers OA (world); using Eprints  ShareGeo : for geo-spatial data Open Data and turnstile (UK); using DSpace DataShare : for research data (institutional, U of Ed)  Open Data; using DSpace  OA Repository Junction  as shared service tool  using own code and  Eprints as an 'escrow' repository during the transfer process. All seek to be ‘standards-based’ , reducing need to be mediated
 
DSpace as open repository software Open Architecture: supports flexibility Active international development community Developer Community provides software add-on and pool of experience  e.g.  creating de-referenceable URIs (RDF/XML) for all in metadata store,  to support both registry function and also engage with semantic web Implications for Jorum:  self-deposit of learning materials Particular requirements,  as per Jorum Roadmap, based upon: mediated ingest for multiple items  (RSS for metadata; OAI-PMH for objects in IntraLibrary)  unmediated deposit (Selenium & SWORD/OARJ) in test-bed development cost of ownership, as per JISC OSS Watch:  “ ..  important that when procuring open source software solutions you also plan to properly resource collaboration work .. ” need to embed developments back into DSpace codebase: enables others to use, maintain and develop the new features ensures that your extensions are in main code  increases the return on investment by making them available to all
[HCI]  User Interface for Unmediated Re-direction/Routing + option for Unmediated Deposit
 
DataShare as institution’s data repository
Theo Andrew  & Ian Stuart  (EDINA) http://oarepojunction.wordpress.com/
 
task for the Broker is  to accept an item for  deposit, package it and transfer it Junction is a deduction tool via database of repositories takes a deposit object and extracts location information from object to deduce a list of potential targets. Theo Andrew  & Ian Stuart  (EDINA) Junction API  service-quality, documented at http://oarepojunction.wordpress.com/junction-api/
Theo Andrew  & Ian Stuart  (EDINA) OARJ Project activity:  Organisations that hold content for deposit in IRs:  a) as Proof of Concept transfer of content from UKPMC subject repository to IRs.  now working: 1. Data manually extracted 2. Institutional affiliation deduced 3. Imported into broker 4. into METS* for SWORD transfer * Some problems with some metadata 5. Test export to both Eprints (OpenDepot.org) and DSpace (ERA) 6. Confirm deposit sent back to broker OARJ Project activity:  Organisations that hold content for deposit in IRs:  b) As Demonstrator   (working with IRs in 7 universities) Set up  daily XML export of new records added to UKPMC to OARJ Broker .   Earlier work with Nature Pub Gp NPG supply author-submitted manuscripts from journals (with embargo information) OARJ Broker transfer to  authors’ institutional repository
Est. Number Of Articles For Transfer During Six Month Period  based upon the number of papers published in journals in the NPG portfolio during Jan - June 2010, as recorded in PubMed Central and ISI Web of Knowledge. **Still to be confirmed as a participating institutions OARJ Demonstrator with NPG
http://www.rsp.ac.uk/start/software-survey/results-2010/
Have JISC [programme managers] moved on? “ Dealing with institutional processes now, rather than repository technology. Depending on type of content, the projects would fit much more closely in:  managing research data programme  research information programme  open educational resources programme  as they have much more in common with those projects than they do with each other.” “ repositories have found their core business proposition via the REF and making sure Universities list research outputs to obtain research ratings  -  have not succeeded in making the business case that IRs should be doing the job of archiving, a core library platform, or the job of an institutional demonstrator/poster space.  Repositories fit in the ‘University Enterprise Stack’ by virtue of being a system that delivers a business solution to a real financial problem.”
Re-stating our shared task, to (re-)include data: To ensure ease & continuing access  to record of scholarship research publications  and  research data  Consider at least three types of (research) data:  Supplementary data multimedia files:  part of the published article that presents research argument and conclusions more than linear text, limited tabular and graphical display enhances user experience with various multimedia objects Research dataset(s) upon which conclusions based check analysis of those data to support statements made  Database(s) from which datasets were assembled for reproducibility (exposure to refutation) and new work  via alternative analysis and updates to the database(s)
Citation, then linking Citation of d atabase(s)  (Type C data) for reproducibility (exposure to refutation)  to prompt new work via alternative analysis and updates to the database(s) to credit those who curate the data needed for scholarship Citation of the datasets used  (Type B data) verification of analysis, that  the figures and conclusions accurately reflect  those data Plus  hyperlink   to  the dataset  from  the published article …  and back again  from  the dataset  to  the published article  +  Links to presentations, blogs, websites, funders etc related to the same research activity and same researcher(s)   (Type D data?)
Standards to cite data  ( A  long running saga) There is no universal standard for citing data and computer files,  but … Dodd, Sue. ( 1979 ) “Bibliographic references for numeric social science data files: Suggested guidelines.”  Journal of the American Society for Information Science ,  30 (2), 77-82.   ISO 690:  1987  Bibliographic references - Content, form and structure Dodd, Sue. ( 1990 ) “Bibliographic References for Computer Files in the Social Science: A Discussion Paper.”  Chapel Hill, NC: Institute for Research in Social Science, University of North Carolina .  presented to IASSIST 1990 Poughkeepsie, N.Y .  http://www.people.virginia.edu/~pm9k/info/compRef.html   ISO 690-2:  1997  Bibliographic references, Part 2: Electronic documents Schneider, Jeri. ( 2006 ) “Why we need a data citation standard: Lessons learned from compiling ICPSR’s Bibliography of Data-Related Literature.”  ICPSR Bulletin , 26 (2), 9-12.   http://www.icpsr.umich.edu/org/publications/bulletin/spr06.pdf
Obtaining the citation at source CIESIN  “ Most of our datasets and products contain a suggested citation on the Web site as to where the data was obtained” “ Whenever possible, we urge you to cite the use of data  and web resources in the reference section”  http ://sedac.ciesin.columbia.edu/citations / How to Cite Statistics Canada Products: “ This guide has been developed for authors, editors, researchers, academics, students, librarians and data librarians.  “ It describes, in three steps, how to build your reference  when citing Statistics Canada products” http://www.statcan.gc.ca/pub/12-591-x/12-591-x2006001-eng.htm Get it from those who make the data available: the data publishers cf Cataloguing in Publication!
Linked Data …  Is this shared understanding? A note from Tim Berners Lee now in circulation proposes 4 steps: Use URIs as names for things Use http URIs so that people [& computers?] can look up those names When someone looks up a URI, provide useful information  using the standards (RDF, SPARQL) Include links to other URIs, so that they can discover more things.  may become the principles/rules/definition of ‘Linked Data’
Research publications as research data DISC-UK DataShare Project (Edinburgh, LSE,  Oxford, Southampton) From informal storage and sharing To formal publishing into  data infrastructure
Research publications as research data DataShare2 from formal  institutional  arrangement to formal publishing into  (linked) data infrastructure
Time for me to stop Hoping that I have left some space/place for questions Thank you Acknowledgements   Theo Andrew & Ian Stuart, Pablo de Castro,  Gareth Waller & Robin Rice,  Dave Flanders &   Andy McGregor
Ease and Continuity of Access to Data in Difficult Times   End of an era?  End of the R word?  Embedded in domain-specific processes, but with wider context Engage, connect and get leverage from Internet Engineering W3C and the commercial/retail world Linkage and Identifiers really, really matter in m2m world Moving from technology to policy & practice some domain-specific, some common to repositories Collection management: active curation & Linked relationships  versions, of data | articles | learning materials Collections, ‘see also’ Curation as value-added linkage between items First point of public issue (availability); Take-down regimes Institutional stewardship responsibility  for content that ‘we need’ for research and education including data and other materials manufactured from within ‘our world’ born-digital [and digitised] content  What of the (new) shared services imperative? Who does what, at what level/scale?
COAR:  Confederation of Open Access Repositories 48 members drawn largely from Europe, but including both JISC & CNI, and also EDINA (University of Edinburgh) Work Plan for 2010/12, including Advocacy on behalf of OA  and  repositories (Rs) [both together?] Populating (OA) Rs Best practice documents Facilitate and ensure data interoperability of (across?) Rs interoperability with other systems (such as CRIS systems) Support national helpdesks Guidance on how Rs will form essential elements for global e-infrastructure Promote R manager profession Provide advice & guidance on suitable R infrastructure technologies Global (meta)data store Strategic partner other infrastructure-related initiatives worldwide
Sound & Pictures: access to new data sources 20 th  Century is the first fully audio-visual century With new forms of research material to use and to master EDINA as platform for downloadable film, video and audio   Licensed for use in learning, teaching and research Wide range of subject coverage, including documentary film Film & Sound Online 600 hours of film, digitised for downloading NewsFilm Online   3000 hours of material from ITN & Reuters Over 4TBs of clips to download Plus Education Image Gallery of still photography Visual and Sound Materials Portal  Discovering all sorts of audio-visual material
http://www.rsp.ac.uk/start/software-survey/results-2010/
UK-CORR: UK  Council of Research Repositories individual rather than institutional,  [email_address]   UK has ‘rich heterogeneous repository landscape’ (C.Awre);  lurk  following comment from Dorothea Salo:  US mainly about OA full texts; UK mainly about … serving research assessment! Is there more to IRs than the REF: lots of bibliographic records & little full text? Should IRs only accept full text, not metadata only? in absence of a CRIS, our IR had to do REF  (Lancaster & Northampton)   was OA but then RAE2008, but should aim to include all  (OU) motive for IR was digital preservation, with different REF system; funder mandate compliance for OA; visibility via OA  (Oxford/Bodleian) RAE/REF is opportunity to engage institution-wide  (Warwick) Advent of CRIS  (which don’t manage outputs well)  may be opportunity for IRs to have role, including use of ‘metadata only’ as lever to obtain full text  (Hull) REF & research management information allows IRs to be embedded as platform for OA  (Southampton)  RAE/REF has different goals to OA and IRs with low % of full text may undermine OA movement  (Nottingham)

More Related Content

Open Repositories and Interoperability Challenges in UK

  • 1. Open Repositories and Interoperability Challenges in UK Peter Burnhill Director, EDINA National Data Centre, University of Edinburgh, Scotland UK DL.org Workshop, The British Academy, London UK, 4 February 2011 Digital Libraries and Open Access: Interoperability Strategies
  • 2. + expertise, in areas such as geo-enabling, access management, etc
  • 3. UK funding councils JISC Sub-Committees JISC Collections acting as platform for network-level services & helping to build the JISC Integrated Information Environment research, learning & teaching in UK universities & colleges Research Councils UK National Data Centres
  • 4. Began as a data manufacturer Scottish Education Data Archive, late 1970s – mid ‘80s Survey statistician: for school leaver, YTS & 16-19 cohort surveys curated as databases : the working capital for research group + ‘guests’ Became a data broker Edinburgh University Data Library, mid- 1980s & on Providing library and ease of access to data held elsewhere Connected to IASSIST , international group for data librarians/archivists Learning about interesting spaces [time/place referencing] ESRC Regional Research Laboratory for Scotland 1986/90 Co-director: early days of Geographical Information Systems (GIS) Inter-Agency Committee on Global Environmental Change Data Task Force Moved into national data services; learnt more about data curation EDINA national data centre, mid-1990s to present Director: set-up and continuing development of access/deposit services Digital Curation Centre, 2004 & 2005 Director: set-up & digital curation = ‘data curation’ & ‘digital preservation’ autobiography as commentary on data facilities Spot the repository!
  • 5. Researchers’ viewpoint: a cultural shift? Preserve or Perish “ You are not finished until you have done the research, published the results, and published the data, receiving formal credit for everything.” Mark A. Parsons (2006) International Polar Year “ A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” in Advice to a Young Investigator (1897) Santiago Ramón y Cajal (Nobel Prize winner, 1906)
  • 6. document tradition & computation tradition Emergence of Digital Library: Information Science “ considerable simplification, … helpful to think … of two traditions, or mentalities, even cultures, co-exist in area of Information Science “ Approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like “ approaches based on uses for formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical (as in algorithmic procedures).” Michael Buckland ( UC Berkeley), Presidential Address, American Society for Information Science, JASIS’s 50th (1998) http://people.ischool.berkeley.edu/~buckland/asis62.html
  • 7. Semantics of ‘Open Repositories & Interoperability’ R is for Repository ” university-based institutional repository is a set of services … for the management and dissemination of digital materials created by the institution & its community members. … organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access ...” (C. Lynch, 2003 ) Digital repository differs from other digital collections in that: "content is deposited, whether by content creator, owner or third party "architecture manages content as well as metadata "repository offers a minimum set of basic services "must be sustainable & trusted, well-supported & well-managed” Digital Repositories Review (R.Heery and S.Anderson, 2005) O is for Open OA (for publications) not the only ‘open’ policy: OER: Open Educational Resources Open means ‘not closed’: making teaching & learning materials visible Open CourseWare – often as open stack of webpages Open Data Datasets tradition (IASSIST); ‘open/privilege access to databases; open data.gov Open Source Software OSS has its own way of doing things
  • 8. ‘ Open Repositories & Interoperability’ [in the UK] Heretical and Haphazard Thinking About The Brief … Are Repositories the (only) way to support an Open Agenda? and Is Open really what Repositories are for? Or Is this usage just intended to help us avoid issues of IP and access management? And should the focus be on: Interoperability between Repositories? or Interoperability of Repositories with the wider Internet?
  • 9. Interoperability Strategies; Interoperability Challenges Whose strategy, and towards what purpose? ‘ within & for the research & education sector’? Or beyond? for the institution, the UK, EU, global anybody; for the researcher? for the machine as user [“Provider/Consumer”] Interoperability as technical [& semantic] means to support interworking by persons or systems has challenges policy/technology/infrastructure/management/metadata Internet engineering & semantic web Repositories and Linked Data Beyond PUT, KEEP and GET of the singular Repository Connecting Repositories with NOTIFY & EXCHANGE; TRANSFER of objects or metadata only So that the content of the object (or its metadata) may be re-used Linking & Identifiers Really Matter! and Registries have a key role as authorities and cross-walks
  • 10. Maybe can we agree our shared & central task… to ensure ease & continuity of access to (online/digital) scholarly resources for researchers, students and their teachers, now and into the future My perspective In a University-based organisation (EDINA) that is a provider of content services & infrastructure services within national (UK) policy framework number of content services based on use of repository software Eprints, DSpace, IntraLibrary number of infrastructure activities OpenDepot.org, OA Repository Junction; OpenURL Router member of SONEX and indirectly of COAR and UK-CORR Later focus on on repository-related progress in the UK; where is the value, how this is assessed/expressed? Size of investment in recent times; cost-effectiveness and ‘impact’ Effort at institutional & inter/national level and the ‘shared services’ agenda?
  • 11. Nostalgia for Days of Plenty as we worry about the future JISC as well-funded agent for change JISC Repositories & Preservation Programme - April 2006; March 2009 “ £14m investment in H.E. repository and digital content infrastructure” A drive to assist institutions, including JISC RepositoryNet? Repository Support Project; Repository Research Project Intute Repository Search; ‘interim repository’ | the Depot | OpenDepot.org Services/Tools like OpenDOAR; Romeo; SWORD; OARJ Check the JISC website http://www.jisc.ac.uk/whatwedo/programmes/inf11/sue2.aspx under the heading of ‘key digital repository activities’ are 21 funding programmes and 226 funded projects. & then there were many meetings, including a new ‘regular’ street event: RepoFringe2010: Repository Fringe 2/3 September, Edinburgh
  • 12. SONEX Began with focus on ‘deposit opportunities’ categorisation of repository types into which authors deposit Began with research paper use case: multi-author & multi-institutional Looking at onward interoperability (SWORD) not just technical interoperability but workflow
  • 13. Day Job: as provider of services/tools & user of software EDINA and Edinburgh University Data Library run repositories, with and without JISC Jorum : for learning materials [with Mimas] OER and turnstile (UK); using DSpace OpenDepot (the Depot): for research papers OA (world); using Eprints ShareGeo : for geo-spatial data Open Data and turnstile (UK); using DSpace DataShare : for research data (institutional, U of Ed) Open Data; using DSpace OA Repository Junction as shared service tool using own code and Eprints as an 'escrow' repository during the transfer process. All seek to be ‘standards-based’ , reducing need to be mediated
  • 14.  
  • 15. DSpace as open repository software Open Architecture: supports flexibility Active international development community Developer Community provides software add-on and pool of experience e.g. creating de-referenceable URIs (RDF/XML) for all in metadata store, to support both registry function and also engage with semantic web Implications for Jorum: self-deposit of learning materials Particular requirements, as per Jorum Roadmap, based upon: mediated ingest for multiple items (RSS for metadata; OAI-PMH for objects in IntraLibrary) unmediated deposit (Selenium & SWORD/OARJ) in test-bed development cost of ownership, as per JISC OSS Watch: “ .. important that when procuring open source software solutions you also plan to properly resource collaboration work .. ” need to embed developments back into DSpace codebase: enables others to use, maintain and develop the new features ensures that your extensions are in main code increases the return on investment by making them available to all
  • 16. [HCI] User Interface for Unmediated Re-direction/Routing + option for Unmediated Deposit
  • 17.  
  • 18. DataShare as institution’s data repository
  • 19. Theo Andrew & Ian Stuart (EDINA) http://oarepojunction.wordpress.com/
  • 20.  
  • 21. task for the Broker is to accept an item for deposit, package it and transfer it Junction is a deduction tool via database of repositories takes a deposit object and extracts location information from object to deduce a list of potential targets. Theo Andrew & Ian Stuart (EDINA) Junction API service-quality, documented at http://oarepojunction.wordpress.com/junction-api/
  • 22. Theo Andrew & Ian Stuart (EDINA) OARJ Project activity: Organisations that hold content for deposit in IRs: a) as Proof of Concept transfer of content from UKPMC subject repository to IRs. now working: 1. Data manually extracted 2. Institutional affiliation deduced 3. Imported into broker 4. into METS* for SWORD transfer * Some problems with some metadata 5. Test export to both Eprints (OpenDepot.org) and DSpace (ERA) 6. Confirm deposit sent back to broker OARJ Project activity: Organisations that hold content for deposit in IRs: b) As Demonstrator (working with IRs in 7 universities) Set up daily XML export of new records added to UKPMC to OARJ Broker . Earlier work with Nature Pub Gp NPG supply author-submitted manuscripts from journals (with embargo information) OARJ Broker transfer to authors’ institutional repository
  • 23. Est. Number Of Articles For Transfer During Six Month Period based upon the number of papers published in journals in the NPG portfolio during Jan - June 2010, as recorded in PubMed Central and ISI Web of Knowledge. **Still to be confirmed as a participating institutions OARJ Demonstrator with NPG
  • 25. Have JISC [programme managers] moved on? “ Dealing with institutional processes now, rather than repository technology. Depending on type of content, the projects would fit much more closely in: managing research data programme research information programme open educational resources programme as they have much more in common with those projects than they do with each other.” “ repositories have found their core business proposition via the REF and making sure Universities list research outputs to obtain research ratings - have not succeeded in making the business case that IRs should be doing the job of archiving, a core library platform, or the job of an institutional demonstrator/poster space. Repositories fit in the ‘University Enterprise Stack’ by virtue of being a system that delivers a business solution to a real financial problem.”
  • 26. Re-stating our shared task, to (re-)include data: To ensure ease & continuing access to record of scholarship research publications and research data Consider at least three types of (research) data: Supplementary data multimedia files: part of the published article that presents research argument and conclusions more than linear text, limited tabular and graphical display enhances user experience with various multimedia objects Research dataset(s) upon which conclusions based check analysis of those data to support statements made Database(s) from which datasets were assembled for reproducibility (exposure to refutation) and new work via alternative analysis and updates to the database(s)
  • 27. Citation, then linking Citation of d atabase(s) (Type C data) for reproducibility (exposure to refutation) to prompt new work via alternative analysis and updates to the database(s) to credit those who curate the data needed for scholarship Citation of the datasets used (Type B data) verification of analysis, that the figures and conclusions accurately reflect those data Plus hyperlink to the dataset from the published article … and back again from the dataset to the published article + Links to presentations, blogs, websites, funders etc related to the same research activity and same researcher(s) (Type D data?)
  • 28. Standards to cite data ( A long running saga) There is no universal standard for citing data and computer files, but … Dodd, Sue. ( 1979 ) “Bibliographic references for numeric social science data files: Suggested guidelines.” Journal of the American Society for Information Science , 30 (2), 77-82. ISO 690: 1987 Bibliographic references - Content, form and structure Dodd, Sue. ( 1990 ) “Bibliographic References for Computer Files in the Social Science: A Discussion Paper.” Chapel Hill, NC: Institute for Research in Social Science, University of North Carolina . presented to IASSIST 1990 Poughkeepsie, N.Y . http://www.people.virginia.edu/~pm9k/info/compRef.html ISO 690-2: 1997 Bibliographic references, Part 2: Electronic documents Schneider, Jeri. ( 2006 ) “Why we need a data citation standard: Lessons learned from compiling ICPSR’s Bibliography of Data-Related Literature.” ICPSR Bulletin , 26 (2), 9-12. http://www.icpsr.umich.edu/org/publications/bulletin/spr06.pdf
  • 29. Obtaining the citation at source CIESIN “ Most of our datasets and products contain a suggested citation on the Web site as to where the data was obtained” “ Whenever possible, we urge you to cite the use of data and web resources in the reference section” http ://sedac.ciesin.columbia.edu/citations / How to Cite Statistics Canada Products: “ This guide has been developed for authors, editors, researchers, academics, students, librarians and data librarians. “ It describes, in three steps, how to build your reference when citing Statistics Canada products” http://www.statcan.gc.ca/pub/12-591-x/12-591-x2006001-eng.htm Get it from those who make the data available: the data publishers cf Cataloguing in Publication!
  • 30. Linked Data … Is this shared understanding? A note from Tim Berners Lee now in circulation proposes 4 steps: Use URIs as names for things Use http URIs so that people [& computers?] can look up those names When someone looks up a URI, provide useful information using the standards (RDF, SPARQL) Include links to other URIs, so that they can discover more things. may become the principles/rules/definition of ‘Linked Data’
  • 31. Research publications as research data DISC-UK DataShare Project (Edinburgh, LSE, Oxford, Southampton) From informal storage and sharing To formal publishing into data infrastructure
  • 32. Research publications as research data DataShare2 from formal institutional arrangement to formal publishing into (linked) data infrastructure
  • 33. Time for me to stop Hoping that I have left some space/place for questions Thank you Acknowledgements Theo Andrew & Ian Stuart, Pablo de Castro, Gareth Waller & Robin Rice, Dave Flanders & Andy McGregor
  • 34. Ease and Continuity of Access to Data in Difficult Times End of an era? End of the R word? Embedded in domain-specific processes, but with wider context Engage, connect and get leverage from Internet Engineering W3C and the commercial/retail world Linkage and Identifiers really, really matter in m2m world Moving from technology to policy & practice some domain-specific, some common to repositories Collection management: active curation & Linked relationships versions, of data | articles | learning materials Collections, ‘see also’ Curation as value-added linkage between items First point of public issue (availability); Take-down regimes Institutional stewardship responsibility for content that ‘we need’ for research and education including data and other materials manufactured from within ‘our world’ born-digital [and digitised] content What of the (new) shared services imperative? Who does what, at what level/scale?
  • 35. COAR: Confederation of Open Access Repositories 48 members drawn largely from Europe, but including both JISC & CNI, and also EDINA (University of Edinburgh) Work Plan for 2010/12, including Advocacy on behalf of OA and repositories (Rs) [both together?] Populating (OA) Rs Best practice documents Facilitate and ensure data interoperability of (across?) Rs interoperability with other systems (such as CRIS systems) Support national helpdesks Guidance on how Rs will form essential elements for global e-infrastructure Promote R manager profession Provide advice & guidance on suitable R infrastructure technologies Global (meta)data store Strategic partner other infrastructure-related initiatives worldwide
  • 36. Sound & Pictures: access to new data sources 20 th Century is the first fully audio-visual century With new forms of research material to use and to master EDINA as platform for downloadable film, video and audio Licensed for use in learning, teaching and research Wide range of subject coverage, including documentary film Film & Sound Online 600 hours of film, digitised for downloading NewsFilm Online 3000 hours of material from ITN & Reuters Over 4TBs of clips to download Plus Education Image Gallery of still photography Visual and Sound Materials Portal Discovering all sorts of audio-visual material
  • 38. UK-CORR: UK Council of Research Repositories individual rather than institutional, [email_address] UK has ‘rich heterogeneous repository landscape’ (C.Awre); lurk following comment from Dorothea Salo: US mainly about OA full texts; UK mainly about … serving research assessment! Is there more to IRs than the REF: lots of bibliographic records & little full text? Should IRs only accept full text, not metadata only? in absence of a CRIS, our IR had to do REF (Lancaster & Northampton) was OA but then RAE2008, but should aim to include all (OU) motive for IR was digital preservation, with different REF system; funder mandate compliance for OA; visibility via OA (Oxford/Bodleian) RAE/REF is opportunity to engage institution-wide (Warwick) Advent of CRIS (which don’t manage outputs well) may be opportunity for IRs to have role, including use of ‘metadata only’ as lever to obtain full text (Hull) REF & research management information allows IRs to be embedded as platform for OA (Southampton) RAE/REF has different goals to OA and IRs with low % of full text may undermine OA movement (Nottingham)

Editor's Notes

  1. EDINA may be less familiar, at least to all of you. It is a national academic data centre, established in 1995 following the success of the University of Edinburgh putting forward its Data Library in an open competition to set up three datacentres capable of hosting and providing access to bibliographic datasets and numeric research data. The other two were BIDS, which subsequently moved into the private sector as Ingenta, and MIDAS, the data centre at the University of Manchester - its now renamed as Mimas. The mission of EDINA, which incidentally is the older poetic name for Edinburgh, is to enhance productivity of research, learning and teaching in the UK. It used to host a range of key A&T databaes like BIOSIS ~Previews, Compendex, Inspec, Art Abstracts etc, but now the services on journal …. As you can see, EDINA is a funded by JISC … <click>
  2. As many of you will know, JISC is the Joint Systems Committee of the UK funding bodies for higher and further education. It has a number of sub-committees which help inform policy and also watch over programmes of funding and the operation of services, such as those provided by the two National Data Centres. It has also set up a company, JISC Collections as a legal body to broker licences.
  3. Present: what is Jorum UI/home page Find Share
  4. EDINA may be less familiar, at least to all of you. It is a national academic data centre, established in 1995 following the success of the University of Edinburgh putting forward its Data Library in an open competition to set up three datacentres capable of hosting and providing access to bibliographic datasets and numeric research data. The other two were BIDS, which subsequently moved into the private sector as Ingenta, and MIDAS, the data centre at the University of Manchester - its now renamed as Mimas. The mission of EDINA, which incidentally is the older poetic name for Edinburgh, is to enhance productivity of research, learning and teaching in the UK. It used to host a range of key A&T databaes like BIOSIS ~Previews, Compendex, Inspec, Art Abstracts etc, but now the services on journal …. As you can see, EDINA is a funded by JISC … <click>
  5. EDINA may be less familiar, at least to all of you. It is a national academic data centre, established in 1995 following the success of the University of Edinburgh putting forward its Data Library in an open competition to set up three datacentres capable of hosting and providing access to bibliographic datasets and numeric research data. The other two were BIDS, which subsequently moved into the private sector as Ingenta, and MIDAS, the data centre at the University of Manchester - its now renamed as Mimas. The mission of EDINA, which incidentally is the older poetic name for Edinburgh, is to enhance productivity of research, learning and teaching in the UK. It used to host a range of key A&T databaes like BIOSIS ~Previews, Compendex, Inspec, Art Abstracts etc, but now the services on journal …. As you can see, EDINA is a funded by JISC … <click>
  6. UK & EU EU and ASEAN comparison?
  7. Type A: if its part of the published work then should we look to the preservation agencies, the national deposit libraries and CLOCKSS, LOCKSS and Portico, for access over the longer term? And do publishers see these as costly files to maintain in the short to medium term? Or do publishers want to hand the responsibility to subject and institutional data repositories? Type B: and with knowledge of UKDA, ADS but also the (growing but problematic ??) call for data to be held in institutional repositories, some recommendations on what is the 'right thing' to do, and how that can be done with ease - a Repository Junction task! For Type C, I intend to propose what editors should require by way of citation and URL link. I am on the hunt for such editorial practice.
  8. Largest team within EDINA mixture of GIS specialists and software engineers Major content provider within academia including the ESRC Census Geography Data Unit Highly experienced and skilled team provides advice nationally and internationally active in standards development active in GI community nationally and internationally Substantial experience in handling and delivering key geospatial data and geo-referenced information (including critical social science data such as census boundaries and postcode directories) First online GI service, UKBORDERS, launched in 1994 Demands of the services offered means team has been at leading edge of GI service development in UK Strategic move toward interoperability & shared services role (e-Framework) Value added component = making data usable Largest team within EDINA mixture of GIS specialists and software engineers Major content provider within academia including the ESRC Census Geography Data Unit Highly experienced and skilled team provides advice nationally and internationally active in standards development active in GI community nationally and internationally Substantial experience in handling and delivering key geospatial data and geo-referenced information (including critical social science data such as census boundaries and postcode directories) First online GI service, UKBORDERS, launched in 1994 Demands of the services offered means team has been at leading edge of GI service development in UK Strategic move toward interoperability & shared services role (e-Framework) Value added component = making data usable