SlideShare a Scribd company logo
User engagement in research data preservation and curation Stuart Macdonald  EDINA National Data Centre,  University of Edinburgh Luis Martinez-Uribe  Oxford e-Research Centre,  University of Oxford ECDL Corfu, 30 September 2009 Image courtesy of Flickr –  http://www.flickr.com/photos/laszlo-photo/1899390628/
Data deluge An updated IDC white paper reported that the digital universe in 2007 was 281 exabytes  and in 2011 should be 1,800 exabytes (or 10 times that produced in 2006). *“The  Diverse and Exploding Digital Universe - an updated forecast of worldwide information growth through 2011- http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf (Mar. 2008) BBSRC strategic plan (2010-2015) consultation document Image courtesy of Flickr –  http://www.flickr.com/photos/timothygreigdotcom/2571912269/
Research data definitions Words, pictures, numbers, sounds Workflows, methodologies, protocols, standard operating procedures, instrumentation, models, questionaires, code books, set-up files, algorithms, transcripts mage courtesy of Flickr –  http://www.flickr.com/photos/sgrantarch/3563676104/ US Office of Management and Budget defines research data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
“ it is becoming increasingly clear that effective and efficient management and reuse of research data will be a key component in the UK knowledge economy in years to come, essential for the efficient conduct of research ….” *JISC (2008) “Identifying the benefits of curating and sharing research data” - http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/databenefits.aspx Research methods experiencing a radical  transformation New tools & infrastructures generating  research data New ways to use, share and re-use Growing importance of preserving and curating research data Image courtesy of Flickr –  http://www.flickr.com/photos/williamhook/2650009682/
Departmental websites Domain-specific repositories Centralised data repositories (UKDA, NERC, MRC) Libraries and computing/IT services within  academic institutions working together to  develop and customise  institutional repositories to curate research data Data deposition and publication Image courtesy of Flickr –  http://www.flickr.com/photos/kuzeytac/3043180722/
Institutional Repositories: open access built for academic publications  technology lead No formal requirements analysis procedures User engagement required to develop systems that will  meet researchers’ needs Bottom up approach to inform top-down thinking Researchers – key user community overlooked Image courtesy of Flickr – http://www.flickr.com/photos/jahdakinebrah/1094697578/ Image courtesy of the periodic table printmaking project –  http://azuregrackle.com/periodictable/table/105.html
Barriers to sharing: time taken to prepare datasets for deposit concerns over making data available before  full academic exploitation misuse / misinterpretation (non-academics) loss of ownership, loss of commercial or  competitive advantage repositories will cease to exist unwillingness to change working practices uncertainty about IPR and confidentiality Open data – realism versus altruism DISC-UK DataShare - to contribute to new models,  workflows and tools for academic data sharing Image courtesy of the periodic table printmaking project –  http://azuregrackle.com/periodictable/table/
Charting individual researcher’s information practices across 7 sub-disciplines of the life sciences -  http://www.rin.ac.uk/case-studies   DCC / ISSTI (University of Edinburgh) Deployed a range of methodologies and tools including short-term ethnographic techniques and semi-structured instruments: Diaries (x55),  F-2-F interviews, (x24)  Cognitive mapping (1 per case),  Focus groups (1 per case) RIN-funded Disciplinary case studies Image courtesy of Flickr –  http://www.flickr.com/photos/pierre_pouliquin/464849560/
Some disciplines lend themselves more than others to ‘openly’ data sharing Research data are varied, specific and complex Data curation and/or sharing only becomes crucial at certain stages of research lifecycle Feeling that only researchers have subject knowledge to curate their own data Keen sense of ‘ownership’ and protectiveness towards data Some findings from RIN Disciplinary case studies project: Image courtesy of Flickr –  http://www.flickr.com/photos/justvodka/4592437 /
Scoping digital repository services for research data management -   http://www.ict.ox.ac.uk/odit/projects/digitalrepository/ Scope requirements for services to manage research data generated by Oxford researchers from a variety of disciplines: Interviews (x37) conducted to learn about data management practices and identify top requirements  Workshop  (x46) held to compliment findings and to gather examples of good practice regarding use of repository services for research data management  Consultation with service units (ORA, data library,NGS, oxford digital library)  - identify gaps in service, validate researchers’ requirements Image courtesy of the periodic table printmaking project –  http://azuregrackle.com/periodictable/table/58.html
Scoping digital repository services -  top requirements Advice on practical issues related to managing data across their life  cycle incl. data management plans, assistance with formatting Secure storage required for large datasets generated by high  throughput instruments Sustainable & authenticated infrastructure that allows publication and  long-term preservation of research data  It is now followed up by the intra-institutional JISC funded Embedding Institutional Data Curation Services in Research (EIDCSR) project  –   http://eidcsr.oucs.ox.ac.uk/
Tools – Data Audit Framework http://www.data-audit.eu/ DAF helps to establish relationships with research communities around  the issues of data curation Allows institutions to identify, locate, describe and assess how they are  managing their research data Provides information specialists who wish to extend support for research  data with a vehicle for engaging with researchers e.g. through local research data management training "staff had numerous comments and suggestions for improvement of data management at different levels indicating an awareness of the issues, even where it had not been made a priority to address"  -  edinburgh data audit implementation project
Summary Repository development distant from current research needs - due to lack of iterative requirements analysis with researchers Open data ethos detached from disciplinary research needs in some cases Trusted relationships  -  dialogue with researchers early in research process Image courtesy of Flickr –  http://www.flickr.com/photos/59414209@N00/3367225630/
Thank you stuart.macdonald@ed.ac.uk  [email_address] Creative Commons  images - courtesy of Flickr Image courtesy of Flickr –  http://www.flickr.com/photos/hippie/2556161507/

More Related Content

User Engagement in Research Data Curation

  • 1. User engagement in research data preservation and curation Stuart Macdonald EDINA National Data Centre, University of Edinburgh Luis Martinez-Uribe Oxford e-Research Centre, University of Oxford ECDL Corfu, 30 September 2009 Image courtesy of Flickr – http://www.flickr.com/photos/laszlo-photo/1899390628/
  • 2. Data deluge An updated IDC white paper reported that the digital universe in 2007 was 281 exabytes and in 2011 should be 1,800 exabytes (or 10 times that produced in 2006). *“The Diverse and Exploding Digital Universe - an updated forecast of worldwide information growth through 2011- http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf (Mar. 2008) BBSRC strategic plan (2010-2015) consultation document Image courtesy of Flickr – http://www.flickr.com/photos/timothygreigdotcom/2571912269/
  • 3. Research data definitions Words, pictures, numbers, sounds Workflows, methodologies, protocols, standard operating procedures, instrumentation, models, questionaires, code books, set-up files, algorithms, transcripts mage courtesy of Flickr – http://www.flickr.com/photos/sgrantarch/3563676104/ US Office of Management and Budget defines research data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
  • 4. “ it is becoming increasingly clear that effective and efficient management and reuse of research data will be a key component in the UK knowledge economy in years to come, essential for the efficient conduct of research ….” *JISC (2008) “Identifying the benefits of curating and sharing research data” - http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/databenefits.aspx Research methods experiencing a radical transformation New tools & infrastructures generating research data New ways to use, share and re-use Growing importance of preserving and curating research data Image courtesy of Flickr – http://www.flickr.com/photos/williamhook/2650009682/
  • 5. Departmental websites Domain-specific repositories Centralised data repositories (UKDA, NERC, MRC) Libraries and computing/IT services within academic institutions working together to develop and customise institutional repositories to curate research data Data deposition and publication Image courtesy of Flickr – http://www.flickr.com/photos/kuzeytac/3043180722/
  • 6. Institutional Repositories: open access built for academic publications technology lead No formal requirements analysis procedures User engagement required to develop systems that will meet researchers’ needs Bottom up approach to inform top-down thinking Researchers – key user community overlooked Image courtesy of Flickr – http://www.flickr.com/photos/jahdakinebrah/1094697578/ Image courtesy of the periodic table printmaking project – http://azuregrackle.com/periodictable/table/105.html
  • 7. Barriers to sharing: time taken to prepare datasets for deposit concerns over making data available before full academic exploitation misuse / misinterpretation (non-academics) loss of ownership, loss of commercial or competitive advantage repositories will cease to exist unwillingness to change working practices uncertainty about IPR and confidentiality Open data – realism versus altruism DISC-UK DataShare - to contribute to new models, workflows and tools for academic data sharing Image courtesy of the periodic table printmaking project – http://azuregrackle.com/periodictable/table/
  • 8. Charting individual researcher’s information practices across 7 sub-disciplines of the life sciences - http://www.rin.ac.uk/case-studies DCC / ISSTI (University of Edinburgh) Deployed a range of methodologies and tools including short-term ethnographic techniques and semi-structured instruments: Diaries (x55), F-2-F interviews, (x24) Cognitive mapping (1 per case), Focus groups (1 per case) RIN-funded Disciplinary case studies Image courtesy of Flickr – http://www.flickr.com/photos/pierre_pouliquin/464849560/
  • 9. Some disciplines lend themselves more than others to ‘openly’ data sharing Research data are varied, specific and complex Data curation and/or sharing only becomes crucial at certain stages of research lifecycle Feeling that only researchers have subject knowledge to curate their own data Keen sense of ‘ownership’ and protectiveness towards data Some findings from RIN Disciplinary case studies project: Image courtesy of Flickr – http://www.flickr.com/photos/justvodka/4592437 /
  • 10. Scoping digital repository services for research data management - http://www.ict.ox.ac.uk/odit/projects/digitalrepository/ Scope requirements for services to manage research data generated by Oxford researchers from a variety of disciplines: Interviews (x37) conducted to learn about data management practices and identify top requirements Workshop (x46) held to compliment findings and to gather examples of good practice regarding use of repository services for research data management Consultation with service units (ORA, data library,NGS, oxford digital library) - identify gaps in service, validate researchers’ requirements Image courtesy of the periodic table printmaking project – http://azuregrackle.com/periodictable/table/58.html
  • 11. Scoping digital repository services - top requirements Advice on practical issues related to managing data across their life cycle incl. data management plans, assistance with formatting Secure storage required for large datasets generated by high throughput instruments Sustainable & authenticated infrastructure that allows publication and long-term preservation of research data It is now followed up by the intra-institutional JISC funded Embedding Institutional Data Curation Services in Research (EIDCSR) project – http://eidcsr.oucs.ox.ac.uk/
  • 12. Tools – Data Audit Framework http://www.data-audit.eu/ DAF helps to establish relationships with research communities around the issues of data curation Allows institutions to identify, locate, describe and assess how they are managing their research data Provides information specialists who wish to extend support for research data with a vehicle for engaging with researchers e.g. through local research data management training "staff had numerous comments and suggestions for improvement of data management at different levels indicating an awareness of the issues, even where it had not been made a priority to address" - edinburgh data audit implementation project
  • 13. Summary Repository development distant from current research needs - due to lack of iterative requirements analysis with researchers Open data ethos detached from disciplinary research needs in some cases Trusted relationships - dialogue with researchers early in research process Image courtesy of Flickr – http://www.flickr.com/photos/59414209@N00/3367225630/
  • 14. Thank you stuart.macdonald@ed.ac.uk [email_address] Creative Commons images - courtesy of Flickr Image courtesy of Flickr – http://www.flickr.com/photos/hippie/2556161507/