SlideShare a Scribd company logo
Introduction to Research Data 
Management 
8 October 2014 
Hardy Schwamm & Masud Khokhar
Overview 
1. What is Research Data Management (RDM)? 
– What is data? 
– Data lifecycle 
2. Why does the University engage in RDM? 
– Funder requirements 
– Lancaster’s RDM Policy 
– Current RDM practices and researcher attitudes 
– What do other universities do? 
– N8 RDM 
3. What RDM services is Lancaster University going to offer? 
– Storage 
– Role of Pure 
– Advocacy & Governance 
– Metadata 
– Training & Support 
– Data Management Plans 
– Data Preservation 
– JISC
1. What is RDM? 
What are data? 
• Facts, observations or experiences on which an argument, 
theory or test is based. Data may be numerical, descriptive or 
visual. Data may be raw or analysed, experimental or 
observational. 
• Data can be “analogue” (hardcopy) or digital. 
• Digital data can be: 
– created in a digital form ("born digital") 
– converted to a digital form (digitised) 
• Very much discipline specific
Introduction to Research Data Management at Lancaster University
Data Lifecycle & Data Management 
Plans 
4. 
Publication 
& Deposit 
5. 
Preservation 
& Re-Use 
1. 
Create 
2. 
Active Use 
3. 
Documentation 
1. What data will you produce? 
2. How will you organise the data? 
3. Can you/others understand the 
data 
4. What data will be deposited and 
where? 
5. Who will be interested in re-using 
the data?
Introduction to Research Data Management at Lancaster University
What is RDM? 
• Research Data Management involves maintaining, preserving 
and adding value to digital research data throughout its 
lifecycle.
2. Why does the University engage 
in RDM? 
The bigger picture. Drivers include: 
• Open Data part of the “Open” movement which gathered 
momentum in the early 2000s. 
• Funders' Data Policies draw on key documents such as the OECD 
Principles and Guidelines for Access to Research Data from Public 
Funding (2007)and the Toronto Statement (2009). 
• Perception of data as a public good. Data as research output 
created with tax-payer’s money. 
• Research Integrity: accurate and efficient collection of data and its 
storage in order to reproduce research results. 
• Compliance with Data Protection and Freedom of 
Information legislation
RCUK Common Principles on Data 
Policy 
Summary of Research Councils UK - Common Principles on Data Policy 
• Public good: Publicly funded research data are produced in the public interest should be 
made openly available with few restrictions 
• Planning for preservation: Institutional and project specific data management policies 
and plans needed to ensure valued data remains usable 
• Discovery: Metadata should be available and discoverable; Published results should 
indicate how to access supporting data 
• Confidentiality: Research organisation policies and practices to ensure legal, ethical and 
commercial constraints assessed; research process not damaged by inappropriate 
release 
• First use: Provision for a period of exclusive use, to enable research teams to publish 
results 
• Recognition: Data users should acknowledge data sources and terms & conditions of 
access 
• Public funding: Investment is appropriate and must be efficient and cost-effective.
Funding requirements 
• RCUK released its Common Principles on Data Policy in April 2011 
• The EPSRC issued their Policy Framework on Research Data in April 2011, 
setting out nine expectations concerning the management and provision of 
access to EPSRC-funded research data. 
– V. Research organisations will ensure that appropriately structured 
metadata describing the research data they hold is published (normally 
within 12 months of the data being generated) and made freely 
accessible on the internet 
– Vii: Research organisations will ensure that EPSRC-funded research data is 
securely preserved for a minimum of 10 years from the date that any 
researcher ‘privileged access’ period expires 
• Funder requirements made universities develop “roadmaps” towards RDM 
compliance. 
• Deadline of compliance with EPSRC’s Policy is May 2015.
Funder requirements: overview of 
data policies 
Table from DCC
Horizon 2020 data policy 
• Projects must “aim to deposit the research data needed to validate 
the results presented in the deposited scientific publications, known 
as “underlying data". 
• Any Horizon 2020 project is invited to submit a DMP as an early 
project deliverable if it is relevant to their research. 
• All projects submitting a research proposal to ‘Research and 
Innovation Actions’ and ‘Innovation Actions’ are required to include 
a short outline of their general data management policy. 
• All projects which are successfully funded under the Pilot on Open 
Research Data are expected to produce an initial DMP deliverable 
within the first six months of the project. 
Horizon 2020 Guidelines on Data Management
Lancaster University’s Research 
Data Policy 
• Approved by Senate in March2013 (SEC/2013/2/0776) 
• “Management of data is an essential part of good research practice and all researchers 
in the University have an obligation to record, store and archive their data 
appropriately.” 
• Expectations include: 
– “Each project will have a data management plan”. 
– “All research data will be stored in either electronic or paper form for a minimum of 
10 years” 
– “Research data will be submitted to national or international data services and 
repositories where available or required by either funders or publishers and this will 
replace the need for local archiving” 
• The University will 
– “aim to provide mechanisms whereby research data […] can be archived 
appropriately” 
– “provide guidance and training where necessary”
But what is current RDM practice 
at Lancaster? 
• We have talked to some Associate Deans for Research. 
• Informal discussions with selected researchers show diverse 
data management practices. 
• Example: Department of Linguistics and English Language run 
the University Centre for Computer Corpus Research on 
Language (UCREL). 
– UCREL research is data heavy, often uses copyrighted texts 
– Local solution of data storage & sharing is Corpus Query 
Processor
Introduction to Research Data Management at Lancaster University
Researcher attitudes 
• Disciplines have very different attitudes towards data sharing. 
• In some research fields - such as genetics and physics - data 
sharing is well-established. 
Attitudes vary: 
“Yet one more thing that gets in the way of doing actual 
research!” 
“My data is confidential and I can’t share it anyway.” 
“I’d welcome your help in managing our data.”
What do other universities do? 
• Jisc had 2 Managing Research Data Programmes (JISCMRD) 
2009-2013, including infrastructure, planning & training 
• Infrastructure: data.bris Research Data Repository, Databank 
(Oxford) 
• Data Management Planning: DMPonline (Digital Curation 
Centre) 
• Training: Mantra (University of Edinburgh) 
• Extensive RDM websites: Bath, Leeds, Leicester 
• But also very basic RDM support/websites: York, Warwick
Example from Exeter data audit 
How do you currently archive the important elements of your 
research data once you have finished with it?” 
Taken from University of Exeter DAF, p. 33
N8 Research Data Management 
Group 
• N8 Partners: Durham, Lancaster, Leeds, Liverpool, Manchester, 
Newcastle, Sheffield and York 
• Collaboration on topics like: 
– N8 data catalogue 
– Resource discovery metadata for ‘published’ data sets 
– Storage options for research data 
– RDM training 
– Case studies of current data practices
Tools 
RDM policies 
Archive 
Preserve 
& Share 
Advocacy (senior mgmt & researcher) 
£ 
Storage 
Back-up 
Access 
Training 
and 
guides 
Support staff & services 
Research 
environment& 
systems 
Metadata and documentation 
DCC 
3. What services is Lancaster 
University going to offer?
Storage 
• At the moment, ISS provides Research data storage where 
large-scale data sets can be securely stored and archived. 
– On demand only 
– Data sharing only among Lancaster colleagues 
• University cloud storage (Box.com) is being investigated 
• Collaboration with external researchers?
Research environment: Pure 
Role of Pure in RDM 
• Pure will remain data entry point for research outputs, 
including datasets. 
• Pure User Group has developed data specification that is now 
being implemented. 
• Pure is not a preservation tool so we need additional service.
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster University
Advocacy & Governance 
• Liaising with Academic Deans for Research 
• Investigating current data practices with recommended 
researchers 
• Collaborating with other research-intense universities: N8 
• Formation of a Research, Open Access, Data Management and 
Pure (ROADMaP) steering group incl. RDM Working Group
Training & Support 
• RDM website with guidance for researchers developed 
• Working together with OED on training 
– DCC Workshop “Introduction to RDM” on 8 December as part of 
Research Development Programme 
• Similar events to follow 
• RDM input to ResearchBites with shorter “How to” sessions 
• “On demand” training sessions in departments
Introduction to Research Data Management at Lancaster University
Data Management Plans 
What is a Data Management Plan (DMP)? 
DMPs vary but they tend to include the following elements: 
1. Description of the data to be collected / created 
2. Standards / methodologies for data collection and 
management 
3. Ethics and Intellectual Property considerations 
4. Plans for data sharing and access 
5. Strategy for long-term preservation
DMPonline 
• DMPonline is a free web-based tool developed by the DCC 
which helps researchers create their DMPs according to funder 
context. 
• There are a number of templates within the tool that 
represent the requirements of 11 different funders (RCUK, 
Wellcome Trust, Horizon 2020 etc.) and institutions. 
• We will create Lancaster custom DMP for research that is not 
externally funded
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster University
Data storage & publication 
Data centre 
supported by 
funder 
No 
University 
Data Archive 
Pure / Hydra 
External Data 
Centre 
Disciplinary: 
GoGeo 
General 
Repository: 
Figshare 
Yes! 
NERC Data 
Centre 
UK Data 
Service
Testing Hydra
Introduction to Research Data Management at Lancaster University
Conclusions 
• RDM is an issue with many sometimes conflicting view points 
(“wicked problem”) 
• RDM is high on agenda of policy makers, funders and the 
University. It won’t go away. 
• A cultural shift is needed if we want to be successful (in a 
similar way as Open Access). 
• Establishing a RDM service needs close collaboration of ISS, 
RSO, Library and Departments.
How can you help? 
• RSO staff will be crucial for successful research data 
management practices at the University. 
• Support us talking to researchers. Who are the data producers 
in different departments? 
• Help the information flow. We need to know what RDM needs 
researchers (academic staff, PhD students, postdocs) have. 
• By communicating effectively between stakeholders on the 
topic of research data management.
Introduction to Research Data Management at Lancaster University

More Related Content

Introduction to Research Data Management at Lancaster University

  • 1. Introduction to Research Data Management 8 October 2014 Hardy Schwamm & Masud Khokhar
  • 2. Overview 1. What is Research Data Management (RDM)? – What is data? – Data lifecycle 2. Why does the University engage in RDM? – Funder requirements – Lancaster’s RDM Policy – Current RDM practices and researcher attitudes – What do other universities do? – N8 RDM 3. What RDM services is Lancaster University going to offer? – Storage – Role of Pure – Advocacy & Governance – Metadata – Training & Support – Data Management Plans – Data Preservation – JISC
  • 3. 1. What is RDM? What are data? • Facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or analysed, experimental or observational. • Data can be “analogue” (hardcopy) or digital. • Digital data can be: – created in a digital form ("born digital") – converted to a digital form (digitised) • Very much discipline specific
  • 5. Data Lifecycle & Data Management Plans 4. Publication & Deposit 5. Preservation & Re-Use 1. Create 2. Active Use 3. Documentation 1. What data will you produce? 2. How will you organise the data? 3. Can you/others understand the data 4. What data will be deposited and where? 5. Who will be interested in re-using the data?
  • 7. What is RDM? • Research Data Management involves maintaining, preserving and adding value to digital research data throughout its lifecycle.
  • 8. 2. Why does the University engage in RDM? The bigger picture. Drivers include: • Open Data part of the “Open” movement which gathered momentum in the early 2000s. • Funders' Data Policies draw on key documents such as the OECD Principles and Guidelines for Access to Research Data from Public Funding (2007)and the Toronto Statement (2009). • Perception of data as a public good. Data as research output created with tax-payer’s money. • Research Integrity: accurate and efficient collection of data and its storage in order to reproduce research results. • Compliance with Data Protection and Freedom of Information legislation
  • 9. RCUK Common Principles on Data Policy Summary of Research Councils UK - Common Principles on Data Policy • Public good: Publicly funded research data are produced in the public interest should be made openly available with few restrictions • Planning for preservation: Institutional and project specific data management policies and plans needed to ensure valued data remains usable • Discovery: Metadata should be available and discoverable; Published results should indicate how to access supporting data • Confidentiality: Research organisation policies and practices to ensure legal, ethical and commercial constraints assessed; research process not damaged by inappropriate release • First use: Provision for a period of exclusive use, to enable research teams to publish results • Recognition: Data users should acknowledge data sources and terms & conditions of access • Public funding: Investment is appropriate and must be efficient and cost-effective.
  • 10. Funding requirements • RCUK released its Common Principles on Data Policy in April 2011 • The EPSRC issued their Policy Framework on Research Data in April 2011, setting out nine expectations concerning the management and provision of access to EPSRC-funded research data. – V. Research organisations will ensure that appropriately structured metadata describing the research data they hold is published (normally within 12 months of the data being generated) and made freely accessible on the internet – Vii: Research organisations will ensure that EPSRC-funded research data is securely preserved for a minimum of 10 years from the date that any researcher ‘privileged access’ period expires • Funder requirements made universities develop “roadmaps” towards RDM compliance. • Deadline of compliance with EPSRC’s Policy is May 2015.
  • 11. Funder requirements: overview of data policies Table from DCC
  • 12. Horizon 2020 data policy • Projects must “aim to deposit the research data needed to validate the results presented in the deposited scientific publications, known as “underlying data". • Any Horizon 2020 project is invited to submit a DMP as an early project deliverable if it is relevant to their research. • All projects submitting a research proposal to ‘Research and Innovation Actions’ and ‘Innovation Actions’ are required to include a short outline of their general data management policy. • All projects which are successfully funded under the Pilot on Open Research Data are expected to produce an initial DMP deliverable within the first six months of the project. Horizon 2020 Guidelines on Data Management
  • 13. Lancaster University’s Research Data Policy • Approved by Senate in March2013 (SEC/2013/2/0776) • “Management of data is an essential part of good research practice and all researchers in the University have an obligation to record, store and archive their data appropriately.” • Expectations include: – “Each project will have a data management plan”. – “All research data will be stored in either electronic or paper form for a minimum of 10 years” – “Research data will be submitted to national or international data services and repositories where available or required by either funders or publishers and this will replace the need for local archiving” • The University will – “aim to provide mechanisms whereby research data […] can be archived appropriately” – “provide guidance and training where necessary”
  • 14. But what is current RDM practice at Lancaster? • We have talked to some Associate Deans for Research. • Informal discussions with selected researchers show diverse data management practices. • Example: Department of Linguistics and English Language run the University Centre for Computer Corpus Research on Language (UCREL). – UCREL research is data heavy, often uses copyrighted texts – Local solution of data storage & sharing is Corpus Query Processor
  • 16. Researcher attitudes • Disciplines have very different attitudes towards data sharing. • In some research fields - such as genetics and physics - data sharing is well-established. Attitudes vary: “Yet one more thing that gets in the way of doing actual research!” “My data is confidential and I can’t share it anyway.” “I’d welcome your help in managing our data.”
  • 17. What do other universities do? • Jisc had 2 Managing Research Data Programmes (JISCMRD) 2009-2013, including infrastructure, planning & training • Infrastructure: data.bris Research Data Repository, Databank (Oxford) • Data Management Planning: DMPonline (Digital Curation Centre) • Training: Mantra (University of Edinburgh) • Extensive RDM websites: Bath, Leeds, Leicester • But also very basic RDM support/websites: York, Warwick
  • 18. Example from Exeter data audit How do you currently archive the important elements of your research data once you have finished with it?” Taken from University of Exeter DAF, p. 33
  • 19. N8 Research Data Management Group • N8 Partners: Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York • Collaboration on topics like: – N8 data catalogue – Resource discovery metadata for ‘published’ data sets – Storage options for research data – RDM training – Case studies of current data practices
  • 20. Tools RDM policies Archive Preserve & Share Advocacy (senior mgmt & researcher) £ Storage Back-up Access Training and guides Support staff & services Research environment& systems Metadata and documentation DCC 3. What services is Lancaster University going to offer?
  • 21. Storage • At the moment, ISS provides Research data storage where large-scale data sets can be securely stored and archived. – On demand only – Data sharing only among Lancaster colleagues • University cloud storage (Box.com) is being investigated • Collaboration with external researchers?
  • 22. Research environment: Pure Role of Pure in RDM • Pure will remain data entry point for research outputs, including datasets. • Pure User Group has developed data specification that is now being implemented. • Pure is not a preservation tool so we need additional service.
  • 26. Advocacy & Governance • Liaising with Academic Deans for Research • Investigating current data practices with recommended researchers • Collaborating with other research-intense universities: N8 • Formation of a Research, Open Access, Data Management and Pure (ROADMaP) steering group incl. RDM Working Group
  • 27. Training & Support • RDM website with guidance for researchers developed • Working together with OED on training – DCC Workshop “Introduction to RDM” on 8 December as part of Research Development Programme • Similar events to follow • RDM input to ResearchBites with shorter “How to” sessions • “On demand” training sessions in departments
  • 29. Data Management Plans What is a Data Management Plan (DMP)? DMPs vary but they tend to include the following elements: 1. Description of the data to be collected / created 2. Standards / methodologies for data collection and management 3. Ethics and Intellectual Property considerations 4. Plans for data sharing and access 5. Strategy for long-term preservation
  • 30. DMPonline • DMPonline is a free web-based tool developed by the DCC which helps researchers create their DMPs according to funder context. • There are a number of templates within the tool that represent the requirements of 11 different funders (RCUK, Wellcome Trust, Horizon 2020 etc.) and institutions. • We will create Lancaster custom DMP for research that is not externally funded
  • 33. Data storage & publication Data centre supported by funder No University Data Archive Pure / Hydra External Data Centre Disciplinary: GoGeo General Repository: Figshare Yes! NERC Data Centre UK Data Service
  • 36. Conclusions • RDM is an issue with many sometimes conflicting view points (“wicked problem”) • RDM is high on agenda of policy makers, funders and the University. It won’t go away. • A cultural shift is needed if we want to be successful (in a similar way as Open Access). • Establishing a RDM service needs close collaboration of ISS, RSO, Library and Departments.
  • 37. How can you help? • RSO staff will be crucial for successful research data management practices at the University. • Support us talking to researchers. Who are the data producers in different departments? • Help the information flow. We need to know what RDM needs researchers (academic staff, PhD students, postdocs) have. • By communicating effectively between stakeholders on the topic of research data management.

Editor's Notes

  1. 1st bullet point is quote from University of Melbourne, used in Lancaster RDM Policy: Appendix 1: Definitions
  2. Examples of Research Data Documents (text, Word), spreadsheets Laboratory notebooks, field notebooks, diaries Questionnaires, transcripts, codebooks Audiotapes, videotapes Photographs, films Slides, artefacts, specimens, samples Collection of digital objects acquired and generated during the process of research Database contents (video, audio, text, images) Models, algorithms, scripts Contents of an application (input, output, logfiles for analysis software, simulation software, schemas) Methodologies and workflows Standard operating procedures and protocols
  3. Here you can see graphical, high level overview of the stages required for successful curation and preservation of data. The model can be used to plan activities related to data management. The questions you see are the ones that relate to sections of a Data Management Plan. More of that later.
  4. From: N8 (David Golding, Leeds): Towards a reference architecture for Research Data Management – DRAFT (which was based on UK Data Archive data lifecycle model) Life cycle “flattened out” with different roles of stakeholders
  5. Definition from DCC: http://www.dcc.ac.uk/digital-curation/what-digital-curation
  6. The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access. Integrity quote from UK Research Integrity Office (UKRIO) states in its Code of Practice: “Organisations should have in place procedures, resources (including physical space) and administrative support to assist researchers in the accurate and efficient collection of data and its storage in a secure and accessible form.” 2003 Fort Lauderdale meeting: pre-publication release of genome datasets Toronto statement: extending the practice of rapid release of prepublication data to other biological data sets. Research is global: The Royal Society has reported that over a third of all articles published in international journals are internationally collaborative, up from a quarter 15 years ago. Researchers need data management tools and services to work this way. Research Integrity: Honesty in all aspects of research including data, transparency and open communication
  7. All Research Councils agreed on these Principles. However, their expectations concerning universities and grants holders vary. EPSRC - Engineering and Physical Sciences Research Council
  8. Terminology Clarifications Published outputs: a policy on published outputs e.g. journal articles and conference papers Data: a dataset’s policy or statement on access to and maintenance of electronic resources Time limits: set timeframes for making content accessible or preserving research outputs Data plan: requirement to consider data creation, management or sharing in the grant application Access/sharing: promotion of OA journals, deposit in repositories, data sharing or reuse Long-term curation: stipulations on long-term maintenance and preservation of research outputs Monitoring: whether compliance is monitored or action taken such as withholding funds Guidance: provision of FAQs, best practice guides, toolkits, and support staff Repository: provision of a repository to make published research outputs accessible Data centre: provision of a data centre to curate unpublished electronic resources or data Costs: a willingness to meet publication fees and data management / sharing costs
  9. RDM Policy firmly puts the responsibility for compliance on individual researchers, with the University providing support for infrastructure and processes.
  10. We have talked to Associate Deans for Research, some academics from LICA, English Language and LEC. Sharing of research data is not a new concept. A number of disciplines have a long history of data share and use of repositories e.g. use of arXiv.org in physics, mathematics & computer science; the International Council for Sciences World Data System for geophysics and biodiversity data; Protein Data Bank and GenBank in molecular biology; Dryad for ecology and evolution
  11. Other universities face of course the same issues that we do. A number of universities have done data audits which show that RDM is fragmented and often ad-hoc. Lancaster was involved in one Jisc bid: MaRDI-Gross (Managing Research Data Infrastructures in Big Science) http://mardigross.jiscinvolve.org/wp/ DCC is funded by Jisc and others and has at its core partners from University of Edinburgh, Glasgow and Bath
  12. From: SUMMARY FINDINGS OF THE OPEN EXETER DATA ASSET FRAMEWORK SURVEY (2012), p. 33 https://ore.exeter.ac.uk/repository/handle/10036/3689
  13. The N8 Research Partnership is a collaboration of the eight most research intensive universities in the North of England: Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York.
  14. This is not a definitive list. It’s just an idea of the building blocks involved and how they might be put together. - Storage is often thought of first. It should be properly backed up with appropriate access controls and ability to access from anywhere - Also need an appropriate environment for research (instruments, hardware, software, VREs) tools and systems e.g. for grants - Aside from current work environments, we also need to consider facilities for archiving to preserve and share data - There’s an inherent need to access/share data, so we need standards, tools and approaches for metadata across the lifecycle - We have the basics of a system, but none of this works without people to keep things running and provide guidance and training - Also need policies to provide overarching governance - And to ensure uptake and maintenance you need buy-in across the board, incentives and financial backing
  15. Storage (working data) versus preservation (long-term storage) Colour green indicates preferred option
  16. Hydra@Lancaster: http://lib-dev.lancs.ac.uk:3000/
  17. A wicked problem is a social or cultural problem that is difficult or impossible to solve for as many as four reasons: incomplete or contradictory knowledge, the number of people and opinions involved, the large economic burden, and the interconnected nature of these problems with other problems. (https://www.wickedproblems.com/1_wicked_problems.php)
  18. Shared inbox