SlideShare a Scribd company logo
Research Data: Opportunities and Challenges
for Universities and Information Producers
Online Information Conference 2013
“Research Data Management” - 20 November 2013
Alain Frey
Scientific & Scholarly Research
alain.frey@thomsonreuters.com
TODAY’S DISCUSSION
Research Data

• Big Data – the growth of data
• Opportunities and Challenges for Universities

• Response from Thomson Science Data Citation
Index Initiative

2
RESEARCH DATA:
PRICELESS, DIVERSE, DISPARATE
Sound discovery relies upon solid supporting data.
• In recent years there has been a tremendous acceleration in
the movement toward open sharing of research data – a
great, and growing volume of research data is now available.
NATURE | News

Gene data to hit milestone*
With close to one million gene-expression data sets now in publicly accessible
repositories, researchers can identify disease trends without ever having to
enter a laboratory.
This article describes how the publicly available Gene Expression Omnibus research data
repository was used by Stanford investigators to lead them to identify a new drug target for
diabetes.

The investigators explain the “beauty of analyzing data from multiple experiments” is that
biases should cancel out between data sets, stating – “there is safety in numbers”.
* Nature News, Nature Publishing Group, Jul 18, 2012.
Copyright © 2012, Rights Managed by Nature Publishing Group
Research Data – Influences of Sharing
The impact of the National Science Foundation mandate
on the research community has been extended to the
library community, as information professionals find a
greater need to serve as trusted consultants in advising
university faculty on data management planning.
RESEARCH DATA:
PRICELESS, DIVERSE, DISPARATE
Sound discovery relies upon solid supporting data.
• The value of available data to research communities across
all disciplines is massive given the potential for its re-use.
• However, gaining a clear understanding of what data exists,
and where, is a challenge.
• Research data repositories are many, they are separately
maintained, and present a variety of schemes of organization
and search capability.
Research Data – Influences on Sharing
– In recent years a significant motivating factor for data sharing
has been the influence of funding organizations.
– In the United States, an event that would impact all aspects
of research data curation and sharing was a 2010 mandate
from the National Science Foundation –
– All funding proposals submitted or due on or after
January 18, 2011, must include a “Data Management Plan”
describing how the proposal will conform to NSF policy on
the dissemination and sharing of research results.
– Globally, other funding organizations have produced similar
mandates.

6
Scientific data: open access to research
results will boost Europe's innovation capacity
• European Commission (July 2012) outlined
measures to improve access to scientific
information produced in Europe.
• Broader and more rapid access to scientific papers
and data will make it easier for researchers and
businesses to build on the findings of public-funded
research.
• Boost Europe's innovation capacity and give
citizens quicker access to the benefits of scientific
discoveries.

7
Challenges for Universities to Manage
Research Data
Data Management is one of the essential areas of
• Issues addressed at the university level
– Meet funding body requirements

– Ensure research integrity and ability to repeat
– Accurate, complete records; authentic and reliable

• Benefits for the university
– Increases university research efficiency and effectiveness
– Saves time and money in the long run
– Prevents unnecessary duplication of research by sharing

– Complies with international standards in research
practices
8
The Process for University
• Data Management Planning
• Documenting Data
• Issues for Data Storage
• Data Security Issues

9
Data Planning
• Type of Data to be housed in repository
• Audience for the data
• Control of data (deposit by PI, researchers,
university management)
• Sharing requirements – planning
• How long should data be available (impacts
storage and planning)
• Directory naming and conventions

• Strategies for back up
10
Documenting Data
• Impacts retrieval and sharing
• Identify Naming conventions at the beginning
– Is the data the result of research protocols or survey data
where naming is essential for retrieval and use

• Interoperability
• Administrative – preservation of data and rights
management

11
Data Storage Issues
• Networked drive storage – ensures back up ease
and management by IS
• Available when required

• Storage in a central repository
• Minimizes risks associated with data loss

12
Research Data – Diverse and Disparate Sources
Today there are many quality repositories
maintained for the purpose of providing access to
research data. Available mechanisms for research
data curation and sharing are proliferating.
However, gaining a clear understanding of what
exists can be a challenge – and repositories are
separately maintained, with varying schemes of
organization and search capabilities.
RESEARCH DATA: MAKING IT DISCOVERABLE,
ACCESSIBLE, & CITABLE:The Information Provider’s
Response
• The need for a multidisciplinary resource to bring this content
into the mainstream
• There is clearly a need for a single point of access to
quality research data from repositories across
disciplines and across the globe.
• This is the objective of Thomson Reuters’ Data Citation Index

• The Data Citation Index is a resource that resides on the well
known Web of Knowledge platform alongside gold-standard
resources such as Web of Science and BIOSIS Citation
Index.
• As with all Thomson Reuters resources, quality is extremely
important. Therefore our approach with the Data Citation
Index is to identify, evaluate, and select key repository
content for inclusion.
REPOSITORY EVALUATION & SELECTION
For the first phase of the Data Citation Index we have identified
data repositories that have some of the most relevant, widely
applicable data and prioritized these for early stage inclusion.
• We will closely monitor usage trends and feedback from our
customers to ensure our content strategy aligns with users’
needs.
• For example, regional needs will be monitored to determine
the best way to meet the expectations of researchers
worldwide.
• Though at this time we cannot yet state growth “numbers”, we
have aggressive goals associated with indexing additional
data repositories going forward.
Thomson Reuters Indexing of Research Data Repositories

Repository raw
metadata is
analyzed by
TR
TR takes
descriptive
metadata feed
from repository

TR adds
metadata
TR DCI
record:
data repository
data study
data set
microcitation
16
DOCUMENT TYPES
Data Citation Index will include all records within repositories. More than 2
million records are included at launch.
These are organized within four Document Types:
 Repository:

the resource comprised of data studies, data sets and/or
microcitations. Stores, presents, and provides access to the data.

 Data Study:

Descriptions of studies or experiments with associated data which
have been used in the data study. Includes serial or longitudinal
studies over time.

 Data Set:

A single or coherent set of data or a data file provided by the
repository, as part of a collection, data study, or experiment.

17
DATA CITATION INDEX
The Data Citation Index will expose important research
data and drive access to it through the Web of Science
platform
• In combination with Web of Science resources that provide
critical coverage of scholarly journals, books, and conference
proceedings, the Data Citation Index works to provide a
comprehensive view of scholarly research bringing research
data into the same arena as the published literature that it
supports
DATA CITATION INDEX:
HOW IT LOOKS AND WORKS
Exposing this Data to the General Research Community
The full record presents fundamental
information about this data study –
an abstract, data type,
miscellaneous descriptors, and basic
taxonomic data.

Through recommendation of a standard format for citing
research data we hope to impact the research community’s
citing practices – facilitating capture and unification of cites
to research data going forward.
The full record serves as a central point
from which to collect information around
this data study, and link to related
information – such as the articles that have
referenced this Data Study.
Above all though – the Data
Citation Index is about getting
users to research data itself.
Link to the Data Set information
within the repository.
Above all though – the Data
Citation Index is about getting
users to research data itself.
Link to the Data Set information
within the repository.
Remaining within the Data
Citation Index, link to all
records associated with this
data study -- or link out directly
to associated data sets.
Information may of course be
printed, e-mailed, or archived
within EndNote Web, EndNote, or
added to one’s ResearcherID
publications list.
Research Data – Opportunities and
Challenges
As Research Data continues to grow exponentially it is critical
to make this source available to the International Research
Community
As policies change and universities adapt standards for
research storage and sharing of research results, it become
more critical for this data to be part of the continuum of
research

25
Thanks you

More Related Content

Alain Frey Research Data for universities and information producers

  • 1. Research Data: Opportunities and Challenges for Universities and Information Producers Online Information Conference 2013 “Research Data Management” - 20 November 2013 Alain Frey Scientific & Scholarly Research alain.frey@thomsonreuters.com
  • 2. TODAY’S DISCUSSION Research Data • Big Data – the growth of data • Opportunities and Challenges for Universities • Response from Thomson Science Data Citation Index Initiative 2
  • 3. RESEARCH DATA: PRICELESS, DIVERSE, DISPARATE Sound discovery relies upon solid supporting data. • In recent years there has been a tremendous acceleration in the movement toward open sharing of research data – a great, and growing volume of research data is now available. NATURE | News Gene data to hit milestone* With close to one million gene-expression data sets now in publicly accessible repositories, researchers can identify disease trends without ever having to enter a laboratory. This article describes how the publicly available Gene Expression Omnibus research data repository was used by Stanford investigators to lead them to identify a new drug target for diabetes. The investigators explain the “beauty of analyzing data from multiple experiments” is that biases should cancel out between data sets, stating – “there is safety in numbers”. * Nature News, Nature Publishing Group, Jul 18, 2012. Copyright © 2012, Rights Managed by Nature Publishing Group
  • 4. Research Data – Influences of Sharing The impact of the National Science Foundation mandate on the research community has been extended to the library community, as information professionals find a greater need to serve as trusted consultants in advising university faculty on data management planning.
  • 5. RESEARCH DATA: PRICELESS, DIVERSE, DISPARATE Sound discovery relies upon solid supporting data. • The value of available data to research communities across all disciplines is massive given the potential for its re-use. • However, gaining a clear understanding of what data exists, and where, is a challenge. • Research data repositories are many, they are separately maintained, and present a variety of schemes of organization and search capability.
  • 6. Research Data – Influences on Sharing – In recent years a significant motivating factor for data sharing has been the influence of funding organizations. – In the United States, an event that would impact all aspects of research data curation and sharing was a 2010 mandate from the National Science Foundation – – All funding proposals submitted or due on or after January 18, 2011, must include a “Data Management Plan” describing how the proposal will conform to NSF policy on the dissemination and sharing of research results. – Globally, other funding organizations have produced similar mandates. 6
  • 7. Scientific data: open access to research results will boost Europe's innovation capacity • European Commission (July 2012) outlined measures to improve access to scientific information produced in Europe. • Broader and more rapid access to scientific papers and data will make it easier for researchers and businesses to build on the findings of public-funded research. • Boost Europe's innovation capacity and give citizens quicker access to the benefits of scientific discoveries. 7
  • 8. Challenges for Universities to Manage Research Data Data Management is one of the essential areas of • Issues addressed at the university level – Meet funding body requirements – Ensure research integrity and ability to repeat – Accurate, complete records; authentic and reliable • Benefits for the university – Increases university research efficiency and effectiveness – Saves time and money in the long run – Prevents unnecessary duplication of research by sharing – Complies with international standards in research practices 8
  • 9. The Process for University • Data Management Planning • Documenting Data • Issues for Data Storage • Data Security Issues 9
  • 10. Data Planning • Type of Data to be housed in repository • Audience for the data • Control of data (deposit by PI, researchers, university management) • Sharing requirements – planning • How long should data be available (impacts storage and planning) • Directory naming and conventions • Strategies for back up 10
  • 11. Documenting Data • Impacts retrieval and sharing • Identify Naming conventions at the beginning – Is the data the result of research protocols or survey data where naming is essential for retrieval and use • Interoperability • Administrative – preservation of data and rights management 11
  • 12. Data Storage Issues • Networked drive storage – ensures back up ease and management by IS • Available when required • Storage in a central repository • Minimizes risks associated with data loss 12
  • 13. Research Data – Diverse and Disparate Sources Today there are many quality repositories maintained for the purpose of providing access to research data. Available mechanisms for research data curation and sharing are proliferating. However, gaining a clear understanding of what exists can be a challenge – and repositories are separately maintained, with varying schemes of organization and search capabilities.
  • 14. RESEARCH DATA: MAKING IT DISCOVERABLE, ACCESSIBLE, & CITABLE:The Information Provider’s Response • The need for a multidisciplinary resource to bring this content into the mainstream • There is clearly a need for a single point of access to quality research data from repositories across disciplines and across the globe. • This is the objective of Thomson Reuters’ Data Citation Index • The Data Citation Index is a resource that resides on the well known Web of Knowledge platform alongside gold-standard resources such as Web of Science and BIOSIS Citation Index. • As with all Thomson Reuters resources, quality is extremely important. Therefore our approach with the Data Citation Index is to identify, evaluate, and select key repository content for inclusion.
  • 15. REPOSITORY EVALUATION & SELECTION For the first phase of the Data Citation Index we have identified data repositories that have some of the most relevant, widely applicable data and prioritized these for early stage inclusion. • We will closely monitor usage trends and feedback from our customers to ensure our content strategy aligns with users’ needs. • For example, regional needs will be monitored to determine the best way to meet the expectations of researchers worldwide. • Though at this time we cannot yet state growth “numbers”, we have aggressive goals associated with indexing additional data repositories going forward.
  • 16. Thomson Reuters Indexing of Research Data Repositories Repository raw metadata is analyzed by TR TR takes descriptive metadata feed from repository TR adds metadata TR DCI record: data repository data study data set microcitation 16
  • 17. DOCUMENT TYPES Data Citation Index will include all records within repositories. More than 2 million records are included at launch. These are organized within four Document Types:  Repository: the resource comprised of data studies, data sets and/or microcitations. Stores, presents, and provides access to the data.  Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time.  Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study, or experiment. 17
  • 18. DATA CITATION INDEX The Data Citation Index will expose important research data and drive access to it through the Web of Science platform • In combination with Web of Science resources that provide critical coverage of scholarly journals, books, and conference proceedings, the Data Citation Index works to provide a comprehensive view of scholarly research bringing research data into the same arena as the published literature that it supports
  • 19. DATA CITATION INDEX: HOW IT LOOKS AND WORKS
  • 20. Exposing this Data to the General Research Community The full record presents fundamental information about this data study – an abstract, data type, miscellaneous descriptors, and basic taxonomic data. Through recommendation of a standard format for citing research data we hope to impact the research community’s citing practices – facilitating capture and unification of cites to research data going forward.
  • 21. The full record serves as a central point from which to collect information around this data study, and link to related information – such as the articles that have referenced this Data Study.
  • 22. Above all though – the Data Citation Index is about getting users to research data itself. Link to the Data Set information within the repository.
  • 23. Above all though – the Data Citation Index is about getting users to research data itself. Link to the Data Set information within the repository. Remaining within the Data Citation Index, link to all records associated with this data study -- or link out directly to associated data sets.
  • 24. Information may of course be printed, e-mailed, or archived within EndNote Web, EndNote, or added to one’s ResearcherID publications list.
  • 25. Research Data – Opportunities and Challenges As Research Data continues to grow exponentially it is critical to make this source available to the International Research Community As policies change and universities adapt standards for research storage and sharing of research results, it become more critical for this data to be part of the continuum of research 25

Editor's Notes

  1. In July of this year an article in Nature News detailed one example of the great value of re-use of available data.Imagine the cost savings thru data re-use that are associated with this example. The point here is realizing the value that exists within research data – part of the “Why?” with regard to the DCI initiative.
  2. We’ve seen that virtually all research universities have sections of their web sites devoted to best practices with regard to Data Management Plans, and there are typically dedicated staff focused on assisting faculty with this.
  3. Potential data types aremay also be Image Collections, Algorithms, Maps, Poll Data, and many more. New Data Types will be created as necessary.
  4. Continue for product detail via slides, or – live demo.
  5. Data Citation Index “Times Cited” and links to Citing Articles – these links are actually made through information gathered from the repositories themselves, not through information found in the citing articles that indicate a reference to the data study (in this case) as one might expect.Cites in journal literature to research data studies, data sets, etc. are not at this time found in significant volume, and those that are found for the most part lack the standard structure, format, and consistency that would enable clear interpretation and accurate indexing. Therefore, simply the best way of making these connections today is through the valuable metadata resident in repositories that connects research data to published works that have utilized and cited this data.The Data Citation Index does provide recommended formats for citing research data – and as better citation formats are adopted and utilized we will begin to capture full cited references to research data and make them available in our resources. Only then will one be able to see cites research data within the Cited References of an article within Web of Science. Again – today we make these connections through information tracked and compiled by the data repositories themselves.This is one of the challenges that come with being on the cutting edge of a new initiative where user adoption is key to complete success. Our goal is that data citation becomes standardized much in the same way as citations to primary literature and that this adoption feeds back into our products, creating a cycle of continuous improvement and enhancement in cooperation with our users.
  6. Data Citation Index “Times Cited” and links to Citing Articles – at launch, these links are actually made through information gathered from the repositories themselves, not through information found in the citing articles that indicate a reference to the data study (in this case) as one might expect.Cites in journal literature to research data studies, data sets, etc. are not at this time found in significant volume, and those that are found for the most part lack the standard structure, format, and consistency that would enable clear interpretation and accurate indexing. Therefore, simply the best way of making these connections today is through the valuable metadata resident in repositories that connects research data to published works that have utilized and cited this data.The Data Citation Index does provide recommended formats for citing research data – and as better citation formats are adopted and utilized we will begin to capture full cited references to research data and make them available in our resources. Only then will one be able to see cites research data within the Cited References of an article within Web of Science. Again – today we make these connections through information tracked and compiled by the data repositories themselves.This is one of the challenges that come with being on the cutting edge of a new initiative where user adoption is key to complete success. Our goal is that data citation becomes standardized much in the same way as citations to primary literature and that this adoption feeds back into our products, creating a cycle of continuous improvement and enhancement in cooperation with our users.