The document provides an overview of research data management (RDM) and the RDM services that Lancaster University plans to offer. It discusses that RDM involves maintaining and preserving digital research data throughout its lifecycle. It also notes that funder requirements and policies are driving universities to improve RDM practices to ensure long-term access and reuse of research data. Lancaster University plans to offer storage, advocate for RDM, provide training and support, help with data management plans, and collaborate with other universities and groups like N8 on RDM issues.
Report
Share
Report
Share
1 of 38
More Related Content
Introduction to Research Data Management at Lancaster University
2. Overview
1. What is Research Data Management (RDM)?
– What is data?
– Data lifecycle
2. Why does the University engage in RDM?
– Funder requirements
– Lancaster’s RDM Policy
– Current RDM practices and researcher attitudes
– What do other universities do?
– N8 RDM
3. What RDM services is Lancaster University going to offer?
– Storage
– Role of Pure
– Advocacy & Governance
– Metadata
– Training & Support
– Data Management Plans
– Data Preservation
– JISC
3. 1. What is RDM?
What are data?
• Facts, observations or experiences on which an argument,
theory or test is based. Data may be numerical, descriptive or
visual. Data may be raw or analysed, experimental or
observational.
• Data can be “analogue” (hardcopy) or digital.
• Digital data can be:
– created in a digital form ("born digital")
– converted to a digital form (digitised)
• Very much discipline specific
5. Data Lifecycle & Data Management
Plans
4.
Publication
& Deposit
5.
Preservation
& Re-Use
1.
Create
2.
Active Use
3.
Documentation
1. What data will you produce?
2. How will you organise the data?
3. Can you/others understand the
data
4. What data will be deposited and
where?
5. Who will be interested in re-using
the data?
7. What is RDM?
• Research Data Management involves maintaining, preserving
and adding value to digital research data throughout its
lifecycle.
8. 2. Why does the University engage
in RDM?
The bigger picture. Drivers include:
• Open Data part of the “Open” movement which gathered
momentum in the early 2000s.
• Funders' Data Policies draw on key documents such as the OECD
Principles and Guidelines for Access to Research Data from Public
Funding (2007)and the Toronto Statement (2009).
• Perception of data as a public good. Data as research output
created with tax-payer’s money.
• Research Integrity: accurate and efficient collection of data and its
storage in order to reproduce research results.
• Compliance with Data Protection and Freedom of
Information legislation
9. RCUK Common Principles on Data
Policy
Summary of Research Councils UK - Common Principles on Data Policy
• Public good: Publicly funded research data are produced in the public interest should be
made openly available with few restrictions
• Planning for preservation: Institutional and project specific data management policies
and plans needed to ensure valued data remains usable
• Discovery: Metadata should be available and discoverable; Published results should
indicate how to access supporting data
• Confidentiality: Research organisation policies and practices to ensure legal, ethical and
commercial constraints assessed; research process not damaged by inappropriate
release
• First use: Provision for a period of exclusive use, to enable research teams to publish
results
• Recognition: Data users should acknowledge data sources and terms & conditions of
access
• Public funding: Investment is appropriate and must be efficient and cost-effective.
10. Funding requirements
• RCUK released its Common Principles on Data Policy in April 2011
• The EPSRC issued their Policy Framework on Research Data in April 2011,
setting out nine expectations concerning the management and provision of
access to EPSRC-funded research data.
– V. Research organisations will ensure that appropriately structured
metadata describing the research data they hold is published (normally
within 12 months of the data being generated) and made freely
accessible on the internet
– Vii: Research organisations will ensure that EPSRC-funded research data is
securely preserved for a minimum of 10 years from the date that any
researcher ‘privileged access’ period expires
• Funder requirements made universities develop “roadmaps” towards RDM
compliance.
• Deadline of compliance with EPSRC’s Policy is May 2015.
12. Horizon 2020 data policy
• Projects must “aim to deposit the research data needed to validate
the results presented in the deposited scientific publications, known
as “underlying data".
• Any Horizon 2020 project is invited to submit a DMP as an early
project deliverable if it is relevant to their research.
• All projects submitting a research proposal to ‘Research and
Innovation Actions’ and ‘Innovation Actions’ are required to include
a short outline of their general data management policy.
• All projects which are successfully funded under the Pilot on Open
Research Data are expected to produce an initial DMP deliverable
within the first six months of the project.
Horizon 2020 Guidelines on Data Management
13. Lancaster University’s Research
Data Policy
• Approved by Senate in March2013 (SEC/2013/2/0776)
• “Management of data is an essential part of good research practice and all researchers
in the University have an obligation to record, store and archive their data
appropriately.”
• Expectations include:
– “Each project will have a data management plan”.
– “All research data will be stored in either electronic or paper form for a minimum of
10 years”
– “Research data will be submitted to national or international data services and
repositories where available or required by either funders or publishers and this will
replace the need for local archiving”
• The University will
– “aim to provide mechanisms whereby research data […] can be archived
appropriately”
– “provide guidance and training where necessary”
14. But what is current RDM practice
at Lancaster?
• We have talked to some Associate Deans for Research.
• Informal discussions with selected researchers show diverse
data management practices.
• Example: Department of Linguistics and English Language run
the University Centre for Computer Corpus Research on
Language (UCREL).
– UCREL research is data heavy, often uses copyrighted texts
– Local solution of data storage & sharing is Corpus Query
Processor
16. Researcher attitudes
• Disciplines have very different attitudes towards data sharing.
• In some research fields - such as genetics and physics - data
sharing is well-established.
Attitudes vary:
“Yet one more thing that gets in the way of doing actual
research!”
“My data is confidential and I can’t share it anyway.”
“I’d welcome your help in managing our data.”
17. What do other universities do?
• Jisc had 2 Managing Research Data Programmes (JISCMRD)
2009-2013, including infrastructure, planning & training
• Infrastructure: data.bris Research Data Repository, Databank
(Oxford)
• Data Management Planning: DMPonline (Digital Curation
Centre)
• Training: Mantra (University of Edinburgh)
• Extensive RDM websites: Bath, Leeds, Leicester
• But also very basic RDM support/websites: York, Warwick
18. Example from Exeter data audit
How do you currently archive the important elements of your
research data once you have finished with it?”
Taken from University of Exeter DAF, p. 33
19. N8 Research Data Management
Group
• N8 Partners: Durham, Lancaster, Leeds, Liverpool, Manchester,
Newcastle, Sheffield and York
• Collaboration on topics like:
– N8 data catalogue
– Resource discovery metadata for ‘published’ data sets
– Storage options for research data
– RDM training
– Case studies of current data practices
20. Tools
RDM policies
Archive
Preserve
& Share
Advocacy (senior mgmt & researcher)
£
Storage
Back-up
Access
Training
and
guides
Support staff & services
Research
environment&
systems
Metadata and documentation
DCC
3. What services is Lancaster
University going to offer?
21. Storage
• At the moment, ISS provides Research data storage where
large-scale data sets can be securely stored and archived.
– On demand only
– Data sharing only among Lancaster colleagues
• University cloud storage (Box.com) is being investigated
• Collaboration with external researchers?
22. Research environment: Pure
Role of Pure in RDM
• Pure will remain data entry point for research outputs,
including datasets.
• Pure User Group has developed data specification that is now
being implemented.
• Pure is not a preservation tool so we need additional service.
26. Advocacy & Governance
• Liaising with Academic Deans for Research
• Investigating current data practices with recommended
researchers
• Collaborating with other research-intense universities: N8
• Formation of a Research, Open Access, Data Management and
Pure (ROADMaP) steering group incl. RDM Working Group
27. Training & Support
• RDM website with guidance for researchers developed
• Working together with OED on training
– DCC Workshop “Introduction to RDM” on 8 December as part of
Research Development Programme
• Similar events to follow
• RDM input to ResearchBites with shorter “How to” sessions
• “On demand” training sessions in departments
29. Data Management Plans
What is a Data Management Plan (DMP)?
DMPs vary but they tend to include the following elements:
1. Description of the data to be collected / created
2. Standards / methodologies for data collection and
management
3. Ethics and Intellectual Property considerations
4. Plans for data sharing and access
5. Strategy for long-term preservation
30. DMPonline
• DMPonline is a free web-based tool developed by the DCC
which helps researchers create their DMPs according to funder
context.
• There are a number of templates within the tool that
represent the requirements of 11 different funders (RCUK,
Wellcome Trust, Horizon 2020 etc.) and institutions.
• We will create Lancaster custom DMP for research that is not
externally funded
33. Data storage & publication
Data centre
supported by
funder
No
University
Data Archive
Pure / Hydra
External Data
Centre
Disciplinary:
GoGeo
General
Repository:
Figshare
Yes!
NERC Data
Centre
UK Data
Service
36. Conclusions
• RDM is an issue with many sometimes conflicting view points
(“wicked problem”)
• RDM is high on agenda of policy makers, funders and the
University. It won’t go away.
• A cultural shift is needed if we want to be successful (in a
similar way as Open Access).
• Establishing a RDM service needs close collaboration of ISS,
RSO, Library and Departments.
37. How can you help?
• RSO staff will be crucial for successful research data
management practices at the University.
• Support us talking to researchers. Who are the data producers
in different departments?
• Help the information flow. We need to know what RDM needs
researchers (academic staff, PhD students, postdocs) have.
• By communicating effectively between stakeholders on the
topic of research data management.
Editor's Notes
1st bullet point is quote from University of Melbourne, used in Lancaster RDM Policy: Appendix 1: Definitions
Examples of Research Data
Documents (text, Word), spreadsheets
Laboratory notebooks, field notebooks, diaries
Questionnaires, transcripts, codebooks
Audiotapes, videotapes
Photographs, films
Slides, artefacts, specimens, samples
Collection of digital objects acquired and generated during the process of research
Database contents (video, audio, text, images)
Models, algorithms, scripts
Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
Methodologies and workflows
Standard operating procedures and protocols
Here you can see graphical, high level overview of the stages required for successful curation and preservation of data. The model can be used to plan activities related to data management.
The questions you see are the ones that relate to sections of a Data Management Plan. More of that later.
From: N8 (David Golding, Leeds): Towards a reference architecture for Research Data Management – DRAFT (which was based on UK Data Archive data lifecycle model)
Life cycle “flattened out” with different roles of stakeholders
Definition from DCC: http://www.dcc.ac.uk/digital-curation/what-digital-curation
The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access.
Integrity quote from UK Research Integrity Office (UKRIO) states in its Code of Practice:
“Organisations should have in place procedures, resources (including physical space) and administrative support to assist researchers in the accurate and efficient collection of data and its storage in a secure and accessible form.”
2003 Fort Lauderdale meeting: pre-publication release of genome datasets
Toronto statement: extending the practice of rapid release of prepublication data to other biological data sets.
Research is global: The Royal Society has reported that over a third of all articles published in international journals are internationally collaborative, up from a quarter 15 years ago. Researchers need data management tools and services to work this way.
Research Integrity: Honesty in all aspects of research including data, transparency and open communication
All Research Councils agreed on these Principles. However, their expectations concerning universities and grants holders vary.
EPSRC - Engineering and Physical Sciences Research Council
Terminology Clarifications
Published outputs: a policy on published outputs e.g. journal articles and conference papers
Data: a dataset’s policy or statement on access to and maintenance of electronic resources
Time limits: set timeframes for making content accessible or preserving research outputs
Data plan: requirement to consider data creation, management or sharing in the grant application
Access/sharing: promotion of OA journals, deposit in repositories, data sharing or reuse
Long-term curation: stipulations on long-term maintenance and preservation of research outputs
Monitoring: whether compliance is monitored or action taken such as withholding funds
Guidance: provision of FAQs, best practice guides, toolkits, and support staff
Repository: provision of a repository to make published research outputs accessible
Data centre: provision of a data centre to curate unpublished electronic resources or data
Costs: a willingness to meet publication fees and data management / sharing costs
RDM Policy firmly puts the responsibility for compliance on individual researchers, with the University providing support for infrastructure and processes.
We have talked to Associate Deans for Research, some academics from LICA, English Language and LEC.
Sharing of research data is not a new concept. A number of disciplines have a long history of data share and use of repositories e.g. use of arXiv.org in physics, mathematics & computer science; the International Council for Sciences World Data System for geophysics and biodiversity data; Protein Data Bank and GenBank in molecular biology; Dryad for ecology and evolution
Other universities face of course the same issues that we do. A number of universities have done data audits which show that RDM is fragmented and often ad-hoc.
Lancaster was involved in one Jisc bid: MaRDI-Gross (Managing Research Data Infrastructures in Big Science) http://mardigross.jiscinvolve.org/wp/
DCC is funded by Jisc and others and has at its core partners from University of Edinburgh, Glasgow and Bath
From: SUMMARY FINDINGS OF THE OPEN EXETER DATA ASSET FRAMEWORK SURVEY (2012), p. 33
https://ore.exeter.ac.uk/repository/handle/10036/3689
The N8 Research Partnership is a collaboration of the eight most research intensive universities in the North of England: Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York.
This is not a definitive list. It’s just an idea of the building blocks involved and how they might be put together.
- Storage is often thought of first. It should be properly backed up with appropriate access controls and ability to access from anywhere
- Also need an appropriate environment for research (instruments, hardware, software, VREs) tools and systems e.g. for grants
- Aside from current work environments, we also need to consider facilities for archiving to preserve and share data
- There’s an inherent need to access/share data, so we need standards, tools and approaches for metadata across the lifecycle
- We have the basics of a system, but none of this works without people to keep things running and provide guidance and training
- Also need policies to provide overarching governance
- And to ensure uptake and maintenance you need buy-in across the board, incentives and financial backing
Storage (working data) versus preservation (long-term storage)
Colour green indicates preferred option
Hydra@Lancaster: http://lib-dev.lancs.ac.uk:3000/
A wicked problem is a social or cultural problem that is difficult or impossible to solve for as many as four reasons: incomplete or contradictory knowledge, the number of people and opinions involved, the large economic burden, and the interconnected nature of these problems with other problems. (https://www.wickedproblems.com/1_wicked_problems.php)