SlideShare a Scribd company logo
Supporting the development of a national
Research Data Discovery Service – a Pilot
Project
Stuart Macdonald
EDINA & Data Library
University of Edinburgh
stuart.macdonald@ed.ac.uk
• University of Edinburgh
• Background and context
• UK Research Data Discovery Service
• PhD Interns
• Observations
• Closing remarks
University of Edinburgh
• Founded in 1582 - 6th oldest university in the English-speaking world and one of
Scotland's 4 ancient universities.
• 3 Colleges (MVM, CSE, CHSS) , 22 Schools
• Over 60 disciplinary/cross-disciplinary Institutes and Research Centres
• 34000 students, 4500 researchers, 6000 research students
Background
• EDINA and Data Library are a division within Information Services (IS) of the University of
Edinburgh.
• EDINA is a Jisc-funded centre for digital expertise providing national online resources for
education and research.
• Data Library & Consultancy assists Edinburgh University users in the discovery, access, use and
management of research datasets.
• The Data Library is part of the new Research Data Service – the culmination of a 36 month RDM
Roadmap (phase 1 and 2) to implement the University’s RDM Policy and develop a suite of RDM
Services that map onto the research lifecycle
Data Library Services: http://www.ed.ac.uk/is/data-library
EDINA: http://edina.ac.uk/
• In order to be reused, research data must be discoverable.
• The EPSRC Research Data Expectations* requires research organisations to maintain a
data catalogue to record metadata about research data generated by EPSRC-funded
research projects.
• Universities are increasingly making research data assets available through repositories
or other data portals.
• The requirement for a UK research data discovery service has grown as universities
become more involved in RDM and capacity develops.
* https://www.epsrc.ac.uk/about/standards/researchdata/expectations/
Context
UK Research Data Discovery Service (RDDS)
In 2013, the Digital Curation Centre (DCC) and the UK Data Service piloted a registry
service to aggregate metadata for research data held within a sample of UK universities
and national, discipline specific data centres.
This 6 month pilot that tested an existing data registry architecture developed by the
Australian National Data Service (ANDS).
This was followed up with Phase 2 funding from Jisc to evaluate technical solutions and
further develop a national Research Data Discovery Service
• https://www.jisc.ac.uk/rd/projects/uk-research-data-discovery
• http://ckan.data.alpha.jisc.ac.uk/
As part of Phase 2 University of Edinburgh received funding from Jisc to support the development of
UKRDDS.
PhD interns from the 3 Colleges were hired through a ‘streamlined’ e-recruitment* process - As part of
IS’s plan to recruit 500 PhD interns per academic year – complete with formal eligibility to work checks, inductions,
probation reports, end of employment /continuation of employment processes !!
To engage with local researchers in order to make metadata and full data sets available for harvest
into the pilot service for discovery and potential reuse.
This work was co-ordinated jointly by EDINA & Data Library and Library & University Collections.
Progress was reported back to Jisc via monthly UKRDDS meetings and F-2-F workshops as well as
representation on UKRDDS Technical and Metadata Advisory Groups.
PhD Interns: responsibilities
To develop plans for getting researchers in schools engaged
with recording or sharing their data
To work closely with researchers and School administrators
to assist in the description and upload of research data into:
• PURE, the University’s proprietary Current Research
Information System, used as a data catalogue where
descriptive metadata about datasets can be added to link to
related research outputs, publications or projects.
• Work needed to convert PURE ver. 5 API into OAI-PMH end-
point
Edinburgh DataShare - the University’s OA multi-
disciplinary data repository hosted by the Data Library
• It allows University researchers to upload, share, and
license their data resources for online discovery and
re-use by others.
• OAI-PMH compliant
• Built on the DSpace platform
• http://datashare.is.ed.ac.uk
Other responsibilities:
To validate and quality control metadata records ingested into both PURE and DataShare
for the purpose of being harvested by UKRDDS
To develop or enhance the quality of metadata records to the standard set for UKRDDS
To assist in the identification and deposit of research datasets deemed suitable or
appropriate for open publication and long-term preservation into DataShare
To record their own observations and provide period reports on data sharing and
cataloguing practices within respective Schools.
Observations
1st tranche of PhD interns (Dec. 15 – April 16)
• School of Literatures, Languages and Cultures
• Roslin Institute
• School of Social and Political Science
2nd tranche (Mar. 15 – Sept.16)
• Division of Infection and Pathway Medicine. School of Medicine
• School of Literatures, Languages and Cultures (2nd intern)
3rd tranche (June. 16 – Sept. 16)
• School of Divinity
• School of Engineering
Literatures, Languages, and Cultures (LLC)
• 3 datasets described in PURE, 2 datasets deposited into DataShare and described in PURE
• 14 researchers interviewed for LLC ( + 7 researchers for Philosophy, Psychology and Language
Science)
• LLC has dedicated RDM webpages
• Communications with researchers within the two Schools were conducted via Research
Administrators
• Research Administrators and researchers happy to talk once the interns is not seen as an
‘enforcing figure’
• Researchers expressed discomfort or unfamiliarity concerning online distribution of data and
unease about upsetting publishers making their data available online
• Due to the nature of humanities research, where interpretation of existing artefacts (books,
historic texts, manuscripts) is itself the research output, researchers did not tend to regard this as
data
• Copyright was seen as one of the main issue hindering dataset deposit – a limiting factor when
researchers’ data is based on texts and other archival material.
• Also, some documents no longer under copyright are restricted from imaging due to preservation
efforts
• When texts themselves are a researcher’s own ‘data’ (as if often the case in Humanities) there is
still a reluctance to share
Roslin Institute
67 researchers interviewed belonging to 4 divisions (70% of total)
• Infection and Immunity
• Genetics and genomics
• Neurobiology and Developmental Biology
• Clinical researchers from Veterinary School
0 datasets deposited in DataShare. Linking data in e.g. NCBI to PURE unrealisitic (see next slide)
• PhD interns worked closely with dedicated Data Manager, PURE Administrator and Research
Administrator.
• Roslin have dedicated RDM webpages
• c. 60% researchers kept their research outputs up-to-date in PURE though very few had updated
research data metadata or were aware that they could.
• c. 90% of researchers submitted data to journals and open access domain repositories e.g. 50%
submitted to NCBI , 20% submitted to EBI
Number of datasets deposited into NCBI from
Roslin Institute are large (e.g. Over 55,000
expressed sequence tags, over 73000 protein
sequences, over 132000 genome survey
sequences)
Unwieldy proposition to record metadata
from NCBI into PURE
Currently no automated processes in place.
• The main reasons stated for using these repositories were:
• Funder requirement
• Default repository within their discipline
• Recommendation by peers
• c. 40% of researchers were confident about the safety of their data and long term gaurantees
provided by the domain repositories, whereas c. 60% did not know or were not sure
• Researchers working with industry partners indicated that due to confidential nature of the data
they do not upload data to open access repositories
• Only one third of researchers had heard about DataShare (with only one researcher who had
used it). Two thirds hadn’t heard of it.
• In general there was no interest in using DataShare due to well established domain repositories
Social and Political Science
• 19 datasets held in the UK Data Archive described in PURE, 0 datasets deposited into DataShare
• 15 researchers identified as having made data available via the UK Data Archive were sent a
questionnaire – only 2 knew about DataShare
• 12 ESRC funded PhD students interviewed (about making their data available in UKDA /
DataShare) - No Data Management Plans written by ESRC funded PhDs at start of research (this is
now mandatory)
• 10 researchers interviewed (different from those that answered the questionnaire)
• Research Assistants are regularly employed to manage, clean and publish datasets. The
temporary nature of contracts often means that the knowledge and practice of curating datasets
is not retained within the School
• Among the challenges cited by researchers for making datasets available both in a quantitative
and qualitative sense, the most common is that of ethics and anonymisation
• Of c. 300 researchers in the School between 2008-2016 only 19 had deposited data in the UK Data
Archive
• This confirmed (in the eyes of ther PhD intern) that making datasets available in open access or
domain repositories is not necessarily a wide spread practice nor of primary importance
Closing remarks
• Internships instrumental in starting RDM conversations within Schools
• Mixed economy of research culture, practice and behaviour
• Speed and process of data generation, description and deposit varies
• Are we surprised? Old habits die hard.
• Build it and they will come!
• From a service provision perspective there is no one-size-fits all solution
• With more emphasis placed on ‘as required’ service solutions
• Greater understanding needed of disciplinary and sub-disciplinary practice
• Rethink outreach, formal and informal training strategies
• Targeted approach, local data managers, 6FTEs
• OA has taken c. 10 years to become embedded as common practice within the
scholarly communication process
• Arguably it is early days for RDM
• We’ll await observations from other Schools with interest !
Questions!
Special thanks to:
Rodrigo Bacigalupe
Cleo Davies
James Jafali
Natalie Lankester-Carthy
Bridget Moynihan

More Related Content

Supporting the development of a national Research Data Discovery Service – a Pilot Project

  • 1. Supporting the development of a national Research Data Discovery Service – a Pilot Project Stuart Macdonald EDINA & Data Library University of Edinburgh stuart.macdonald@ed.ac.uk
  • 2. • University of Edinburgh • Background and context • UK Research Data Discovery Service • PhD Interns • Observations • Closing remarks
  • 3. University of Edinburgh • Founded in 1582 - 6th oldest university in the English-speaking world and one of Scotland's 4 ancient universities. • 3 Colleges (MVM, CSE, CHSS) , 22 Schools • Over 60 disciplinary/cross-disciplinary Institutes and Research Centres • 34000 students, 4500 researchers, 6000 research students
  • 4. Background • EDINA and Data Library are a division within Information Services (IS) of the University of Edinburgh. • EDINA is a Jisc-funded centre for digital expertise providing national online resources for education and research. • Data Library & Consultancy assists Edinburgh University users in the discovery, access, use and management of research datasets. • The Data Library is part of the new Research Data Service – the culmination of a 36 month RDM Roadmap (phase 1 and 2) to implement the University’s RDM Policy and develop a suite of RDM Services that map onto the research lifecycle Data Library Services: http://www.ed.ac.uk/is/data-library EDINA: http://edina.ac.uk/
  • 5. • In order to be reused, research data must be discoverable. • The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects. • Universities are increasingly making research data assets available through repositories or other data portals. • The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops. * https://www.epsrc.ac.uk/about/standards/researchdata/expectations/ Context
  • 6. UK Research Data Discovery Service (RDDS) In 2013, the Digital Curation Centre (DCC) and the UK Data Service piloted a registry service to aggregate metadata for research data held within a sample of UK universities and national, discipline specific data centres. This 6 month pilot that tested an existing data registry architecture developed by the Australian National Data Service (ANDS). This was followed up with Phase 2 funding from Jisc to evaluate technical solutions and further develop a national Research Data Discovery Service • https://www.jisc.ac.uk/rd/projects/uk-research-data-discovery • http://ckan.data.alpha.jisc.ac.uk/
  • 7. As part of Phase 2 University of Edinburgh received funding from Jisc to support the development of UKRDDS. PhD interns from the 3 Colleges were hired through a ‘streamlined’ e-recruitment* process - As part of IS’s plan to recruit 500 PhD interns per academic year – complete with formal eligibility to work checks, inductions, probation reports, end of employment /continuation of employment processes !! To engage with local researchers in order to make metadata and full data sets available for harvest into the pilot service for discovery and potential reuse. This work was co-ordinated jointly by EDINA & Data Library and Library & University Collections. Progress was reported back to Jisc via monthly UKRDDS meetings and F-2-F workshops as well as representation on UKRDDS Technical and Metadata Advisory Groups.
  • 8. PhD Interns: responsibilities To develop plans for getting researchers in schools engaged with recording or sharing their data To work closely with researchers and School administrators to assist in the description and upload of research data into: • PURE, the University’s proprietary Current Research Information System, used as a data catalogue where descriptive metadata about datasets can be added to link to related research outputs, publications or projects. • Work needed to convert PURE ver. 5 API into OAI-PMH end- point
  • 9. Edinburgh DataShare - the University’s OA multi- disciplinary data repository hosted by the Data Library • It allows University researchers to upload, share, and license their data resources for online discovery and re-use by others. • OAI-PMH compliant • Built on the DSpace platform • http://datashare.is.ed.ac.uk
  • 10. Other responsibilities: To validate and quality control metadata records ingested into both PURE and DataShare for the purpose of being harvested by UKRDDS To develop or enhance the quality of metadata records to the standard set for UKRDDS To assist in the identification and deposit of research datasets deemed suitable or appropriate for open publication and long-term preservation into DataShare To record their own observations and provide period reports on data sharing and cataloguing practices within respective Schools.
  • 11. Observations 1st tranche of PhD interns (Dec. 15 – April 16) • School of Literatures, Languages and Cultures • Roslin Institute • School of Social and Political Science 2nd tranche (Mar. 15 – Sept.16) • Division of Infection and Pathway Medicine. School of Medicine • School of Literatures, Languages and Cultures (2nd intern) 3rd tranche (June. 16 – Sept. 16) • School of Divinity • School of Engineering
  • 12. Literatures, Languages, and Cultures (LLC) • 3 datasets described in PURE, 2 datasets deposited into DataShare and described in PURE • 14 researchers interviewed for LLC ( + 7 researchers for Philosophy, Psychology and Language Science) • LLC has dedicated RDM webpages • Communications with researchers within the two Schools were conducted via Research Administrators • Research Administrators and researchers happy to talk once the interns is not seen as an ‘enforcing figure’
  • 13. • Researchers expressed discomfort or unfamiliarity concerning online distribution of data and unease about upsetting publishers making their data available online • Due to the nature of humanities research, where interpretation of existing artefacts (books, historic texts, manuscripts) is itself the research output, researchers did not tend to regard this as data • Copyright was seen as one of the main issue hindering dataset deposit – a limiting factor when researchers’ data is based on texts and other archival material. • Also, some documents no longer under copyright are restricted from imaging due to preservation efforts • When texts themselves are a researcher’s own ‘data’ (as if often the case in Humanities) there is still a reluctance to share
  • 14. Roslin Institute 67 researchers interviewed belonging to 4 divisions (70% of total) • Infection and Immunity • Genetics and genomics • Neurobiology and Developmental Biology • Clinical researchers from Veterinary School 0 datasets deposited in DataShare. Linking data in e.g. NCBI to PURE unrealisitic (see next slide) • PhD interns worked closely with dedicated Data Manager, PURE Administrator and Research Administrator. • Roslin have dedicated RDM webpages • c. 60% researchers kept their research outputs up-to-date in PURE though very few had updated research data metadata or were aware that they could. • c. 90% of researchers submitted data to journals and open access domain repositories e.g. 50% submitted to NCBI , 20% submitted to EBI
  • 15. Number of datasets deposited into NCBI from Roslin Institute are large (e.g. Over 55,000 expressed sequence tags, over 73000 protein sequences, over 132000 genome survey sequences) Unwieldy proposition to record metadata from NCBI into PURE Currently no automated processes in place.
  • 16. • The main reasons stated for using these repositories were: • Funder requirement • Default repository within their discipline • Recommendation by peers • c. 40% of researchers were confident about the safety of their data and long term gaurantees provided by the domain repositories, whereas c. 60% did not know or were not sure • Researchers working with industry partners indicated that due to confidential nature of the data they do not upload data to open access repositories • Only one third of researchers had heard about DataShare (with only one researcher who had used it). Two thirds hadn’t heard of it. • In general there was no interest in using DataShare due to well established domain repositories
  • 17. Social and Political Science • 19 datasets held in the UK Data Archive described in PURE, 0 datasets deposited into DataShare • 15 researchers identified as having made data available via the UK Data Archive were sent a questionnaire – only 2 knew about DataShare • 12 ESRC funded PhD students interviewed (about making their data available in UKDA / DataShare) - No Data Management Plans written by ESRC funded PhDs at start of research (this is now mandatory) • 10 researchers interviewed (different from those that answered the questionnaire)
  • 18. • Research Assistants are regularly employed to manage, clean and publish datasets. The temporary nature of contracts often means that the knowledge and practice of curating datasets is not retained within the School • Among the challenges cited by researchers for making datasets available both in a quantitative and qualitative sense, the most common is that of ethics and anonymisation • Of c. 300 researchers in the School between 2008-2016 only 19 had deposited data in the UK Data Archive • This confirmed (in the eyes of ther PhD intern) that making datasets available in open access or domain repositories is not necessarily a wide spread practice nor of primary importance
  • 19. Closing remarks • Internships instrumental in starting RDM conversations within Schools • Mixed economy of research culture, practice and behaviour • Speed and process of data generation, description and deposit varies • Are we surprised? Old habits die hard. • Build it and they will come! • From a service provision perspective there is no one-size-fits all solution • With more emphasis placed on ‘as required’ service solutions
  • 20. • Greater understanding needed of disciplinary and sub-disciplinary practice • Rethink outreach, formal and informal training strategies • Targeted approach, local data managers, 6FTEs • OA has taken c. 10 years to become embedded as common practice within the scholarly communication process • Arguably it is early days for RDM • We’ll await observations from other Schools with interest !
  • 21. Questions! Special thanks to: Rodrigo Bacigalupe Cleo Davies James Jafali Natalie Lankester-Carthy Bridget Moynihan

Editor's Notes

  1. Culmination of
  2. Key expectation 1: The data should be securely stored for at least 10 years Key expectation 2: An online record should be created within 12 months of the data being generated that describes the research data and how to access it. Key expectation 3: Published research papers should include a short statement describing how and on what terms any supporting research data may be accessed.