SlideShare a Scribd company logo
EPSRC research data expectations and
research software management
Stuart Macdonald
Associate Data Librarian
University of Edinburgh
stuart.macdonald@ed.ac.uk
Research Committee Meeting, School of Mathematics, University of Edinburgh, 25 Jan. 2016
What is research data?
Research data is defined by EPSRC as recorded factual material commonly used in
the scientific community as necessary to validate research findings
Although the majority of such data is created in digital format, all research data is
included irrespective of the format in which it is created.
Note that EPSRC does not expect every piece of data produced during a project to
be retained – decisions about what to keep should be taken on a case by case
basis.
There is however a clear expectation that data which underpins published research
outputs will be retained and managed.
• EPSRC have introduced a policy framework concerning the management
and provision of access to publicly-funded research data.
• EPSRC Principal Investigators and the University must demonstrate to
EPSRC that their expectations are being met. The 9 expectations are
detailed at: http://www.epsrc.ac.uk/about/standards/researchdata/
• EPSRC began monitoring compliance on 1st May 2015 on a case-by-case
basis. If it judges sharing of research data is being obstructed then it
reserves the right to impose sanctions.
EPSRC policy framework on research data
The expectations arise from 7 core principles which align with
the core RCUK principles on data sharing, namely:
• EPSRC-funded research data is a public good produced in the public
interest and should be made freely and openly available with as few
restrictions as possible in a timely and responsible manner.
• EPSRC recognises that there are legal, ethical and commercial constraints
on release of research data
• Sharing research data is an important contributor to the impact of publicly
funded research.
• EPSRC-funded researchers should be entitled to a limited period of
privileged access to the data they collect to allow them to work on and
publish their results.
• Data management policies and plans should be in accordance with
relevant standards and community best practice and should exist for all
data
• Sufficient metadata should be recorded and made openly available to
enable other researchers to understand the potential for further research
and re-use of the data
• It is appropriate to use public funds to support the preservation and
management of publicly-funded research data.
What do PIs and researchers need to
know?
• All researchers or research students funded by EPSRC will be required to
comply with these expectations.
• Data that is not generated in digital format will be stored in a manner to
facilitate it being shared in the event of a valid request for access.
• A link to digital research data is expected to be included in the metadata.
• Where access to data is restricted published metadata should give the
reason and summarise the conditions which must be satisfied for access
to be granted.
• Key expectation 1: The data should be securely stored for at least 10 years
• Key expectation 2: An online record should be created within 12 months of
the data being generated that describes the research data and how to
access it.
• Key expectation 3: Published research papers should include a short
statement describing how and on what terms any supporting research
data may be accessed.
What do PIs and researchers need to do?
• Research data that underpins a publication must be stored safely and securely, and
made accessible.
• Data may already be managed by a trusted domain archive outside of the university,
in which case data may not need to be stored locally.
• If not then data must be stored in a suitable University of Edinburgh storage
solution. Minimal compliance is achieved by having your data on DataStore and
then making a secure copy of it into the Data Vault (this service is currently in
development).
�� For those who wish to openly publish data (and a snapshot of their research
software), Edinburgh DataShare is the university’s open online digital repository of
data produced by local researchers (policies, DOI, licence, citation).
• Datasets added to DataShare will be allocated persistent identifiers (DOIs) for
citations.
Key expectation 1: store data securely
• The University is using PURE to record descriptive data (metadata) about
the research data in order to meet this expectation. Research staff are
therefore expected to add a metadata record for any EPSRC-funded
research data into PURE, normally within 12 months of the data being
generated.
• To enter a new dataset description in PURE, click on the green ‘Add new’
button, and select ‘Dataset’.
• Once added to PURE via the dataset content type, the resulting record
should link to the funding source and also link to any associated
publications.
Key expectation 2: a record describing
the data must be freely available online
Data Catalogue in PURE
• If the dataset is available online, for example in DataShare, the URL (or
DOI) of that dataset should also be added.
• Where access to the data is to be restricted, the published dataset
metadata in PURE should give the reason and summarise the conditions
which must be satisfied to grant access.
• Dataset metadata added to PURE will ultimately be publicly accessible via
the Edinburgh Research Explorer subject to confidentiality and other such
restrictions.
• EPSRC state that this expectation ‘could be satisfied by citing such data in the
published research and including in such citations direct links to the data or to
supporting documentation that describes the data in detail, how it may be
accessed and any constraints that may apply.’ Such links should be persistent URLs
such as DOIs.
• An example of a basic data citation would be of the form: ‘Creator (Publication
Year): Title. Publisher. DOI’ Further details can be found at:
https://www.datacite.org/services/cite-your-data.html
• If commercial, legal or ethical reasons exist to protect access to the data these
should be noted in a statement included in the published research paper. A simple
direction to interested parties to ‘contact the author for access’ may not be
considered sufficient.
• The paper must also be made Open Access in PURE.
Key expectation 3: include a statement in
published papers under-pinned by EPSRC-funded data
Does research data include software?
This depends on the research which is being carried out.
As noted in the definition of research data, the deciding factor is whether the software is
necessary to validate research findings, such as those published in a journal paper.
As “rule of thumb”, if your research can’t be replicated without your code then the code
should be included and shared as part of the research data
Often the software, script or simulation may be the research output and the data
produced considered ancillary content. In this case it is more important to store and share
your code than the actual data.
Additionally, even if you don’t need to preserve software, it is good practice to make
software available with adequate documentation to enable others to validate your
research findings, and to access and reuse your research data.
Who should make the decision about
what research data should be
preserved?
Each research organisation will have specific policies and associated processes
to determine what and how publicly-funded research data will be stored and
managed.
Normally it will be the PI of the research project and/or Head of
Department/School who will make the decision about what research data
should be preserved and made available.
It is important to recognise that not all research data can or should be freely
shared – ethical, legal or confidentiality issues may constrain who may have
access.
What about software not produced by
my project, but is required to validate my
research results?
It is prudent, in terms of providing access to your research data and of enabling
your own future research, to take reasonable steps to assure continued availability
of the software you use.
This may include taking a copy of open-source software and preserving it if the
licence allows, or using commercial software where a multi-year support
agreement is available.
Given the requirement to preserve research data for 10 years from the date of last
access by a third party, this provides a compelling reason to use open-data
formats and open-source software
What licence should I choose for my
data and software?
Following the principle that publicly funded research data should be made
openly available with as few restrictions as possible, you should consider
applying an appropriate open licence to the data and software generated by
your research (GNU General Public Licence, MIT, CC, ODC).
The Digital Curation Centre and Software Sustainability Institute have written
guides to help you license research data and choose an open-source licence for
your software.
How to licence research data (DCC) - http://www.dcc.ac.uk/resources/how-guides/license-research-
data
Choosing an open-source licence - http://www.software.ac.uk/resources/guides/adopting-open-source-
licence
Where should I deposit my research data
and software?
EPSRC does not provide a publications repository, research data repository, or
software repository.
Researchers are expected to use institutional or disciplinary/domain repositories
available to them (e.g. GitHub, SubVersion).
It is important that deposited objects can be referenced and accessed via a persistent
identifier (e.g. a DOI) and that appropriately structured metadata describing the
objects is easily discoverable.
A good way to make data and software discoverable is to cite it in research
publications, and to include the persistent identifier (DOI) in the citation.
Writing and using a software
management plan
At present software management plans are relatively new for research
proposals.
The EPSRC Software for the Future call explicitly requires software
management plans as part of the Pathways to Impact.
NSF SI2 funding requires software to be addressed as part of the mandatory
data management plan.
A software management plan is a way of formalising a set of structures and
goals to ensure your software is accessible and reusable in the short, medium
and long term.
Software management plans should
minimally include:
• information on what software outputs (including documentation and other related
material) are expected to be produced
• who is responsible for releasing the software (e.g. PI / lead developer)
• the revision control process to be used [Note: it is important to choose a revision
control / configuration management system that all members of the team will use]
• what license will be used for each output
A software management plan could also:
• identify the software development model
• identify external software and any associated licences
• dependencies and risks
Support
Implementation of the EPSRC Policy at Edinburgh is being supported by the Research
Data Service delivered by Information Services
For help about meeting this policy requirement contact:
• Email: IS.Helpline@ed.ac.uk with “Help with EPSRC data policy framework” in
your subject line.
• Email: PURE@ed.ac.uk if you have questions about PURE.
For help about research data management in general contact:
• Email: IS.Helpline@ed.ac.uk with “Help with Research Data Management in
general” in your subject line.
• Email: IS.Helpline@ed.ac.uk if you would like to arrange an RDM training or
awareness raising session.
For help with research software contact:
• Email: info@software.ac.uk at the Software Sustainability Institute
Thanks!
URLs
• EPSRC Policy Framework on Research Data:
http://www.epsrc.ac.uk/about/standards/researchdata/impact/
• DataStore: https://www.wiki.ed.ac.uk/x/Np9FD
• Edinburgh DataShare: http://datashare.is.ed.ac.uk/
• Data Catalogue in PURE: http://www.pure.ed.ac.uk
• Digital preservation and curation - the danger of overlooking software -
http://www.software.ac.uk/resources/guides/digital-preservation-and-curation-danger-overlooking-
software
• Choosing a repository for your software project - http://software.ac.uk/resources/guides/choosing-
repository-your-software-project
• How to cite and describe software - http://software.ac.uk/so-exactly-what-software-did-you-use
• Writing and using a software management plan - http://www.software.ac.uk/resources/guides/software-
management-plans

More Related Content

EPSRC research data expectations and research software management

  • 1. EPSRC research data expectations and research software management Stuart Macdonald Associate Data Librarian University of Edinburgh stuart.macdonald@ed.ac.uk Research Committee Meeting, School of Mathematics, University of Edinburgh, 25 Jan. 2016
  • 2. What is research data? Research data is defined by EPSRC as recorded factual material commonly used in the scientific community as necessary to validate research findings Although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created. Note that EPSRC does not expect every piece of data produced during a project to be retained – decisions about what to keep should be taken on a case by case basis. There is however a clear expectation that data which underpins published research outputs will be retained and managed.
  • 3. • EPSRC have introduced a policy framework concerning the management and provision of access to publicly-funded research data. • EPSRC Principal Investigators and the University must demonstrate to EPSRC that their expectations are being met. The 9 expectations are detailed at: http://www.epsrc.ac.uk/about/standards/researchdata/ • EPSRC began monitoring compliance on 1st May 2015 on a case-by-case basis. If it judges sharing of research data is being obstructed then it reserves the right to impose sanctions.
  • 4. EPSRC policy framework on research data
  • 5. The expectations arise from 7 core principles which align with the core RCUK principles on data sharing, namely: • EPSRC-funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner. • EPSRC recognises that there are legal, ethical and commercial constraints on release of research data • Sharing research data is an important contributor to the impact of publicly funded research.
  • 6. • EPSRC-funded researchers should be entitled to a limited period of privileged access to the data they collect to allow them to work on and publish their results. • Data management policies and plans should be in accordance with relevant standards and community best practice and should exist for all data • Sufficient metadata should be recorded and made openly available to enable other researchers to understand the potential for further research and re-use of the data • It is appropriate to use public funds to support the preservation and management of publicly-funded research data.
  • 7. What do PIs and researchers need to know? • All researchers or research students funded by EPSRC will be required to comply with these expectations. • Data that is not generated in digital format will be stored in a manner to facilitate it being shared in the event of a valid request for access. • A link to digital research data is expected to be included in the metadata. • Where access to data is restricted published metadata should give the reason and summarise the conditions which must be satisfied for access to be granted.
  • 8. • Key expectation 1: The data should be securely stored for at least 10 years • Key expectation 2: An online record should be created within 12 months of the data being generated that describes the research data and how to access it. • Key expectation 3: Published research papers should include a short statement describing how and on what terms any supporting research data may be accessed. What do PIs and researchers need to do?
  • 9. • Research data that underpins a publication must be stored safely and securely, and made accessible. • Data may already be managed by a trusted domain archive outside of the university, in which case data may not need to be stored locally. • If not then data must be stored in a suitable University of Edinburgh storage solution. Minimal compliance is achieved by having your data on DataStore and then making a secure copy of it into the Data Vault (this service is currently in development). • For those who wish to openly publish data (and a snapshot of their research software), Edinburgh DataShare is the university’s open online digital repository of data produced by local researchers (policies, DOI, licence, citation). • Datasets added to DataShare will be allocated persistent identifiers (DOIs) for citations. Key expectation 1: store data securely
  • 10. • The University is using PURE to record descriptive data (metadata) about the research data in order to meet this expectation. Research staff are therefore expected to add a metadata record for any EPSRC-funded research data into PURE, normally within 12 months of the data being generated. • To enter a new dataset description in PURE, click on the green ‘Add new’ button, and select ‘Dataset’. • Once added to PURE via the dataset content type, the resulting record should link to the funding source and also link to any associated publications. Key expectation 2: a record describing the data must be freely available online
  • 12. • If the dataset is available online, for example in DataShare, the URL (or DOI) of that dataset should also be added. • Where access to the data is to be restricted, the published dataset metadata in PURE should give the reason and summarise the conditions which must be satisfied to grant access. • Dataset metadata added to PURE will ultimately be publicly accessible via the Edinburgh Research Explorer subject to confidentiality and other such restrictions.
  • 13. • EPSRC state that this expectation ‘could be satisfied by citing such data in the published research and including in such citations direct links to the data or to supporting documentation that describes the data in detail, how it may be accessed and any constraints that may apply.’ Such links should be persistent URLs such as DOIs. • An example of a basic data citation would be of the form: ‘Creator (Publication Year): Title. Publisher. DOI’ Further details can be found at: https://www.datacite.org/services/cite-your-data.html • If commercial, legal or ethical reasons exist to protect access to the data these should be noted in a statement included in the published research paper. A simple direction to interested parties to ‘contact the author for access’ may not be considered sufficient. • The paper must also be made Open Access in PURE. Key expectation 3: include a statement in published papers under-pinned by EPSRC-funded data
  • 14. Does research data include software? This depends on the research which is being carried out. As noted in the definition of research data, the deciding factor is whether the software is necessary to validate research findings, such as those published in a journal paper. As “rule of thumb”, if your research can’t be replicated without your code then the code should be included and shared as part of the research data Often the software, script or simulation may be the research output and the data produced considered ancillary content. In this case it is more important to store and share your code than the actual data. Additionally, even if you don’t need to preserve software, it is good practice to make software available with adequate documentation to enable others to validate your research findings, and to access and reuse your research data.
  • 15. Who should make the decision about what research data should be preserved? Each research organisation will have specific policies and associated processes to determine what and how publicly-funded research data will be stored and managed. Normally it will be the PI of the research project and/or Head of Department/School who will make the decision about what research data should be preserved and made available. It is important to recognise that not all research data can or should be freely shared – ethical, legal or confidentiality issues may constrain who may have access.
  • 16. What about software not produced by my project, but is required to validate my research results? It is prudent, in terms of providing access to your research data and of enabling your own future research, to take reasonable steps to assure continued availability of the software you use. This may include taking a copy of open-source software and preserving it if the licence allows, or using commercial software where a multi-year support agreement is available. Given the requirement to preserve research data for 10 years from the date of last access by a third party, this provides a compelling reason to use open-data formats and open-source software
  • 17. What licence should I choose for my data and software? Following the principle that publicly funded research data should be made openly available with as few restrictions as possible, you should consider applying an appropriate open licence to the data and software generated by your research (GNU General Public Licence, MIT, CC, ODC). The Digital Curation Centre and Software Sustainability Institute have written guides to help you license research data and choose an open-source licence for your software. How to licence research data (DCC) - http://www.dcc.ac.uk/resources/how-guides/license-research- data Choosing an open-source licence - http://www.software.ac.uk/resources/guides/adopting-open-source- licence
  • 18. Where should I deposit my research data and software? EPSRC does not provide a publications repository, research data repository, or software repository. Researchers are expected to use institutional or disciplinary/domain repositories available to them (e.g. GitHub, SubVersion). It is important that deposited objects can be referenced and accessed via a persistent identifier (e.g. a DOI) and that appropriately structured metadata describing the objects is easily discoverable. A good way to make data and software discoverable is to cite it in research publications, and to include the persistent identifier (DOI) in the citation.
  • 19. Writing and using a software management plan At present software management plans are relatively new for research proposals. The EPSRC Software for the Future call explicitly requires software management plans as part of the Pathways to Impact. NSF SI2 funding requires software to be addressed as part of the mandatory data management plan. A software management plan is a way of formalising a set of structures and goals to ensure your software is accessible and reusable in the short, medium and long term.
  • 20. Software management plans should minimally include: • information on what software outputs (including documentation and other related material) are expected to be produced • who is responsible for releasing the software (e.g. PI / lead developer) • the revision control process to be used [Note: it is important to choose a revision control / configuration management system that all members of the team will use] • what license will be used for each output A software management plan could also: • identify the software development model • identify external software and any associated licences • dependencies and risks
  • 21. Support Implementation of the EPSRC Policy at Edinburgh is being supported by the Research Data Service delivered by Information Services For help about meeting this policy requirement contact: • Email: IS.Helpline@ed.ac.uk with “Help with EPSRC data policy framework” in your subject line. • Email: PURE@ed.ac.uk if you have questions about PURE. For help about research data management in general contact: • Email: IS.Helpline@ed.ac.uk with “Help with Research Data Management in general” in your subject line. • Email: IS.Helpline@ed.ac.uk if you would like to arrange an RDM training or awareness raising session. For help with research software contact: • Email: info@software.ac.uk at the Software Sustainability Institute
  • 22. Thanks! URLs • EPSRC Policy Framework on Research Data: http://www.epsrc.ac.uk/about/standards/researchdata/impact/ • DataStore: https://www.wiki.ed.ac.uk/x/Np9FD • Edinburgh DataShare: http://datashare.is.ed.ac.uk/ • Data Catalogue in PURE: http://www.pure.ed.ac.uk • Digital preservation and curation - the danger of overlooking software - http://www.software.ac.uk/resources/guides/digital-preservation-and-curation-danger-overlooking- software • Choosing a repository for your software project - http://software.ac.uk/resources/guides/choosing- repository-your-software-project • How to cite and describe software - http://software.ac.uk/so-exactly-what-software-did-you-use • Writing and using a software management plan - http://www.software.ac.uk/resources/guides/software- management-plans