SlideShare a Scribd company logo
Robin Rice
EDINA and Data Library, University of Edinburgh
Repository Fringe 2013: 1-2 August, Edinburgh
*
Six Use Cases for Edinburgh DataShare
*The data repository and
University RDM policy
“9. Research data of future historical interest, and all
research data that represent records of the University,
including data that substantiate research findings, will be
offered and assessed for deposit and retention in an
appropriate national or international data service or
domain repository, or a University repository.”
*
Edinburgh DataShare is
seen by the RDM Steering
Group as one of the key
RDM services offered by
Information Services, and
as such has challenged us
to meet the requirements
of a number of pilot
submissions from a range
of different types of
research communities with
particular types of data.
*
Single item deposit,
the dataset behind an
article.
Desire to get students
to deposit their data
from theses as norm -
need unambiguous
deposit workflow.
Fieldwork in NHS
means much data is
„sensitive‟. Permanent
embargoes?
Dr. Nuno Feirrera,
Teaching Fellow
*
Dr. Bert Remijsen
Chancellor’s Fellow
Village of Fafanlap, Indonesia,
on Bert‟s home page
Dinka Songs of South Sudan
collection, 62 items.
Used Dspace collection
template for metadata; files
uploaded by assisted deposit.
Also deposited in Max Planck
specialist language repository.
Annotations in specialist
format, requiring software
from Max Planck to read.
User happy with download
statistics, referring colleagues.
*
*The Listening Talker
collection identified for
deposit, ongoing.
*Very large video files
plus software as VM
image. Tar gzipped files
containing millions of
files. Several GB in size.
*Desires user
registration, non-
standard licenses and
checksums with
downloads.
Prof. Simon King
*
*Lots of „omics data:
not as many subject
repositories to hold
these as thought –
storage cost concerns.
*Interested in push-pull
of metadata to
websites, from CRIS
*Spearheaded by Data
Manager
Dolly the Sheep
*
*Fish4Knowledge EU-funded
research project
*Long-term sustainability
issues for observational data
*Search engine maintained on
their website – using METS
feed to locate items
*Testing SWORD implemen-
tation, 5% sample >10K files,
video + sql rows (3 TB)
*Efficiency & performance
Prof. Bob Fisher
*
*New member of
University
*Digital asset
management needs
*Nature of research
data in the arts
*Streaming & display
requirements (high
quality desired)
*
*Usability & user education
*Encouraging user to document and future-proof
*Relationship of IRs and and subject repositories
*Closed collections, length of embargoes, user
registration in an open access service
*Enhancing repo. functionality while developing new
systems (storage, data asset registry)
*Repository as golden copy/format
*Preservation procedures and SIPs, AIPs, and DIPs

More Related Content

Six Use Cases for Edinburgh DataShare

  • 1. Robin Rice EDINA and Data Library, University of Edinburgh Repository Fringe 2013: 1-2 August, Edinburgh *
  • 3. *The data repository and University RDM policy “9. Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository.”
  • 4. * Edinburgh DataShare is seen by the RDM Steering Group as one of the key RDM services offered by Information Services, and as such has challenged us to meet the requirements of a number of pilot submissions from a range of different types of research communities with particular types of data.
  • 5. * Single item deposit, the dataset behind an article. Desire to get students to deposit their data from theses as norm - need unambiguous deposit workflow. Fieldwork in NHS means much data is „sensitive‟. Permanent embargoes? Dr. Nuno Feirrera, Teaching Fellow
  • 6. * Dr. Bert Remijsen Chancellor’s Fellow Village of Fafanlap, Indonesia, on Bert‟s home page Dinka Songs of South Sudan collection, 62 items. Used Dspace collection template for metadata; files uploaded by assisted deposit. Also deposited in Max Planck specialist language repository. Annotations in specialist format, requiring software from Max Planck to read. User happy with download statistics, referring colleagues.
  • 7. * *The Listening Talker collection identified for deposit, ongoing. *Very large video files plus software as VM image. Tar gzipped files containing millions of files. Several GB in size. *Desires user registration, non- standard licenses and checksums with downloads. Prof. Simon King
  • 8. * *Lots of „omics data: not as many subject repositories to hold these as thought – storage cost concerns. *Interested in push-pull of metadata to websites, from CRIS *Spearheaded by Data Manager Dolly the Sheep
  • 9. * *Fish4Knowledge EU-funded research project *Long-term sustainability issues for observational data *Search engine maintained on their website – using METS feed to locate items *Testing SWORD implemen- tation, 5% sample >10K files, video + sql rows (3 TB) *Efficiency & performance Prof. Bob Fisher
  • 10. * *New member of University *Digital asset management needs *Nature of research data in the arts *Streaming & display requirements (high quality desired)
  • 11. * *Usability & user education *Encouraging user to document and future-proof *Relationship of IRs and and subject repositories *Closed collections, length of embargoes, user registration in an open access service *Enhancing repo. functionality while developing new systems (storage, data asset registry) *Repository as golden copy/format *Preservation procedures and SIPs, AIPs, and DIPs

Editor's Notes

  1. Edinburgh DataShare is a free-at-point-of-use  data repository service which allows University researchers to upload, share, and license their data resources for online discovery and re-use by others. It was set up in 2008 as an exemplar of institutional data repositories, using DSpace, with partners at Oxford and Southampton working with Fedora and EPrints.
  2. The University RDM Policy has implications for the provision of the data repository service.
  3. EdinburghDataShare is a key component of the Data Stewardship component of the University RDM Roadmap. Legacy datasets can pose challenges for deposit and are not considered important for policy.
  4. This pilot has challenged us on a number of usability issues for deposit: easing the burden of making decisions and making our instructions and hints as clear as possible. Making it easy to skip fields that are not relevant. Provided user guide with screenshots and checklist for deposit.
  5. This pilot user had an audio archive that was well-curated and ready to be made open. Collection already had a ‘home’ in a trusted disciplinary repository, though ours was made public first. User was happy to give the collection greater visibility, as long as he didn’t have to upload files one by one.
  6. User is already delivering files to specialist peers via website. Legacy datasets have existing licences embedded in headers; customised by University lawyers ten years ago. We are grappling with the desire for user registration in an open repository.
  7. Some research considered ‘sensitive’ because of use of animals: not wanting to attract unwanted attention. Many large datasets saved in various places without archiving. Can/should the repository offer their storage solution for large and exponentially growing datasets, so long as they make it open, or should some appraisal step be introduced? The institute is wondering if they should be serving their own data for a price.
  8. The research centre in Taiwan which serves the data during the life of the project may not feel obliged to make the data available long-term. The PI has offered to deposit a 5% sample of the data only. Could this be a good example for an external website maintained by others providing the search mechanism to retrieve objects within the repository? Do we need to alter some aspects of repository behaviour to accommodate this collection and the balance for searching across the repository, pagination of item listings, etc.?
  9. This community is considering Edinburgh DataShare as one of several options for solving a range of problems to do with its research data.