SlideShare a Scribd company logo
From Data Sharing to Data
Stewardship: Meeting Federal Data
Sharing Requirements Now and
into the Future
ICPSR – University of Michigan
Session Outline
• History (brief!) of federal data sharing
requirements
• What is good data sharing? How do you achieve
data stewardship?
• Good news: sustainable data sharing exists
• Public data sharing services – tours & tips
• Resources for creating data management plans
and funding quotes
You should leave this session with -
• Keen understanding of several sustainable
data sharing models
• Ability to critique data sharing services
– Through review of several services
– Walk-away tips for evaluating
• Knowledge (a portal) of resources for creating
data management plans for grant applications
Prologue – Why ICPSR is Here
• ICPSR has been in the data stewardship business for
over 50 years – since 1962
• Center located within the Institute for Social Research
at the University of Michigan
• ICPSR exists to preserve and share research data to
support researchers who:
– Write research articles, books, and papers
– Teach or utilize quantitative methods
– Write grant/contract proposals (require data
management plans)
• Data stewardship = data curation = our purpose
Two ‘Recent’ Moments in Federal Data
Sharing History
• NSF: January 2011 – requirement of data
management plans
• OSTP: February 2013 – Memo with subject
“Increasing Access to the Results of Federally
Funded Scientific Research”
The Statement Heard Round the
Research World:
• In January 2011, the National Science Foundation released a new
requirement for proposal submissions regarding the
management of data generated using NSF support. All proposals
must now include a data management plan (DMP). (NIH has
similar DMP requirements.)
• The plan is to be short, no more than two pages, and is
submitted as a supplementary document. The plan needs to
address two main topics:
– What data are generated by your research?
– What is your plan for managing the data?
The OSTP Memo
• Released February 22, 2013
• This memo directed funding
agencies with an annual R&D
budget over $100 million to
develop a public access plan
for disseminating the results
of their research
The OSTP Memo – A Review
• A concern for investment: “Policies that mobilize these
publications and data for re-use through preservation
and broader public access also maximize the impact and
accountability of the Federal research investment.”
• Federal agencies with over $100 M annually in R&D
expenditures to develop plans to support increased
public access to the results of research funded by the
Federal Government
The details are still developing but the
focus for research data sharing includes:
1. Maximize public access (includes discoverability)
2. Protect confidentiality and privacy
3. Allow for inclusion of costs in proposals for federal funding of
scientific research
4. Appropriate evaluation of submitted data plans
5. Compliance mechanisms
6. Cooperation with the private sector
7. Appropriate attribution
8. Long term preservation and sustainability
What is good data sharing - the basis of
data stewardship?
The goals are simple:
• Data gets used (maximizes taxpayer
investment)
• Available today and into the future
• Research respondent protection
Data Stewardship = Getting Data Used
1. Data must be discoverable
a) Ability to discover data online requires proper tagging and
exposure for search engine indexing
 Concept of a ‘data catalog’
b) Data citation – data used in research articles should have a
DOI and citation just like research articles
2. Data must be accessible
a) On-demand (available for download)
b) Well-documented (survey scope, sample population,
questionnaire, study & data nuances, etc.)
c) Available in usable/popular formats (SPSS, Stata, online
analysis)
Data Stewardship = Future Availability
1. Data in preservation format (ASCII)
2. File migration to current software
versions
3. Well-documented (survey scope, sample
population, questionnaire, study & data
nuances, etc.)
4. Stored in an ever-present archive
(location) – available today and XX+
years from today!
Data Stewardship =
Respondent Confidentiality
• It is critically important to protect the
identities of research subjects
• Disclosure risk is a term that is often used for
the possibility that a data record from a study
could be linked to a specific person
• Data with these risks can be shared:
– Data anonymized for public access
– Data distributed via secured virtual
environment
• Data concerning very sensitive topics can also
be shared via a secured environment
The Concept of Data Curation
• Curation, from the Latin "to care," is the process used to add value to
data, maximize access, and ensure long-term preservation
• Data curation is akin to work performed by an art or museum curator.
– Data are organized, described, cleaned, enhanced, and preserved for
public use, much like the work done on paintings or rare books to make
the works accessible to the public now and in the future
• Curation provides meaningful and enduring access to data = Data
Stewardship!
The Status of Data Sharing
• The Good News
– Good data sharing exists!
• The Bad News
– Good data sharing requires funding -
sustainable funding!
– Sustainable funding for free public access
remains a challenge
Sustainable Data Sharing Models –
Three to Explore
• Fee for access model (subscription model)
• Agency model (agency or foundation funds
public access)
• Fee for deposit model (researcher writes fee
into grant and pays at deposit to fund public
access)
I. Fee-for-Access Data Sharing
• Funding is maintained by annual subscription fees charged to
institutions; individuals at subscribing institutions have free
(open) access to data
• Pooled (ongoing) subscriber fees are used to acquire, curate,
and maintain the service
• The service, open to everyone, is thus sustained by subscribers,
but agencies indicate these models are not ‘open enough’
because of the access fees
II. Agency-funded Data Sharing
• Agency sponsors/funds (ongoing) data curation & sharing enabling the
public to access without charge
• The archive is hosted with a curation entity like ICPSR where the public
can easily discover and access data and restricted-use data can also be
securely shared
• Agency directs data selection and compliance policies
More Agency/Foundation-Funded
Collections Hosted by ICPSR
III. Fee-for-Deposit Data Sharing
• Depositor (individual or entity) pays for data to be
curated and stored – a fee at deposit
• Deposit fees should be written into the grant
application
• Incoming deposit fees sustain the service and the
professionals behind it
• Sustainability risk fairly high in this model as it
depends upon:
– Continuous influx of deposit fees
– Depositors to put allocated fees towards curation &
sharing
Fee for Deposit Services Arriving Daily!
(tips for evaluating coming shortly)
First: A Side-Note on Sharing
Restricted-Use Data
• Data with disclosure risk –
potential to identify a research
subject
• Data with highly sensitive
personal information
What is Restricted-Use Data?
Common Objection/Misperception:
“My data are too sensitive to share. . .”
• ICPSR has been sharing restricted-use data for
over a decade via three methods:
– Secure Download
– Virtual Data Enclave
– Physical Enclave
• ICPSR stores & shares over 6,400 restricted-
use datasets associated with over 2,000
‘active’ restricted-use data contracts
Reality: Restricted-use data can be
effectively shared with the public
• Through the use of a virtual data enclave where
the data never leave the server
• Where there is a process (and understanding!)
to garner IRB approval from the requesting
scientist’s university
• Where there is a system, technology, data
professionals, and collaboration space in place
to disseminate (expensive to build!)
• Because agencies do allow for an incremental
charge to the data requestor to offset marginal
costs
Review of Public Data Sharing Services
• Overview of public data sharing services we have
reviewed
– Some key strengths of each
• Disclaimer: ICPSR has recently launched a public access
service (hosted)
– You’ll likely notice some bias when we talk about the
strengths of openICPSR
– And because we built the service, we know much more
about it
– Still, ICPSR’s public access service isn’t for everyone –
more on that shortly
Public Data Sharing Services
openICPSR – currently in its beta launch
How is openICPSR unique?
openICPSR is a public data-sharing service:
• Where the deposit is reviewed by professional data
curators who are experts in developing metadata (tags) for
the social and behavioral sciences
• With an immediate distribution network of over 750
institutions looking for research data, that has powerful
search tools, and a data catalog indexed by major search
engines
• Sustained by a respected organization with over 50 years of
experience in reliably protecting research data
• Prepared to accept and disseminate sensitive and/or
restricted-use data in the public-access environment
Why should openICPSR’s unique attributes matter
to depositors?
While openICPSR is a new data-sharing service, it is backed by ICPSR
• Discoverable: Posting data online isn’t enough. To maximize usage,
data must be easily discovered. ICPSR is an expert in tagging scientific
data for discovery by potential users
• Usage: ICPSR’s data catalog is searched by thousands of individuals
keenly interested in downloading and analyzing data; the catalog is
also indexed by search engines connecting still more potential
analysts to the data
• Sustainable for the long term: ICPSR has existed as a data archive for
over 50 years; depositors need not worry that their data will suddenly
disappear due to a loss, for example, of funding
• Secure dissemination of sensitive data: ICPSR is prepared to accept
restricted-use data as it has the infrastructure and working
knowledge in place to store and disseminate it securely to the public
What types of deposit packages does
openICPSR offer?
There are two openICPSR package types:
1. Self Deposit: Enables research scientists to deposit data &
documentation on demand and provide immediate public
access. Depositors receive a DOI and data citation upon
publishing and a metadata review shortly after publishing.
The cost is $600 per project.
2. Professional Curation: Enables a research scientist to tap
all aspects of ICPSR’s curation services. The fee depends on
the complexity of the data and the curation services
desired. Scientists must call for a quote, preferably during
the time the grant proposal (specifically the data
management plan) is being prepared.
It is important to emphasize that these fees should
be written into the grant application!
How will openICPSR disseminate sensitive
data to the public?
• The deposit of sensitive (restricted-use) data is similar to the
deposit of non-sensitive data except that the depositor will
indicate that the data should be for restricted-use only
• Dissemination of sensitive data will be through ICPSR’s
virtual data enclave; in this environment, data never leave
the secure server and analysis takes place in the virtual
space
• Scientists desiring to access the data will need to apply for
the data, secure IRB approval, and will pay an access fee
• openICPSR already accepts sensitive (restricted-use);
dissemination of sensitive data is expect to take place in late
2014
A final note: openICPSR accepts research data from
a wide array of disciplines/fields, but not all
Tips for Evaluating a Data Sharing Service
• How will the service sustain itself? Does it have a long term funding
stream?
• How will the service care for my data in the long term should the service
fail? Is there a plan? A safety net?
• Can the service quickly maximize discoverability of my data? Does it
explain how it will do so?
• Does the service have a network of interested researchers & students
seeking data? Will my data get used?
• Does the service have knowledge of international archiving standards?
• Does the service provide a DOI, data citation, and version control should I
need to update my files?
• I have sensitive data to deposit. Does the service understand how to
secure it upon intake and when sharing? Does it have experience in this
area?
Questions to consider when selecting a data sharing service:
Resources for Creating Data
Management Plans for Grant
Applications
ICPSR’s Data Management & Curation Site
http://www.icpsr.umich.edu/datamanagement/
Purpose of Data Management Plans
• Data management plans describe how researchers
will provide for long-term preservation of, and
access to, scientific data in digital formats.
• Data management plans provide opportunities for
researchers to manage and curate their data more
actively from project inception to completion.
Data Management
Plan Resources
Guidelines for Download
And still more guidelines after the
project is awarded:
• Guide emphasizes
preparation for data
sharing throughout
the project
• Available online and
via download (pdf)
Copies of these Slides & Use
• Feel free to share it; present
it; cite it!
• Find copies of these slides
on Slideshare.net
– Several notes and
additional links are found in
the notes view
Get More information
• Visit ICPSR’s Data Management &
Curation site:
http://www.icpsr.umich.edu/datamanage
ment/index.jsp
• Contact us:
– netmail@icpsr.umich.edu
– (734) 647-2200
• More on Assuring Access to
Scientific Data: white paper –
“Sustaining Domain Repositories
for Digital Data”

More Related Content

From Data Sharing to Data Stewardship

  • 1. From Data Sharing to Data Stewardship: Meeting Federal Data Sharing Requirements Now and into the Future ICPSR – University of Michigan
  • 2. Session Outline • History (brief!) of federal data sharing requirements • What is good data sharing? How do you achieve data stewardship? • Good news: sustainable data sharing exists • Public data sharing services – tours & tips • Resources for creating data management plans and funding quotes
  • 3. You should leave this session with - • Keen understanding of several sustainable data sharing models • Ability to critique data sharing services – Through review of several services – Walk-away tips for evaluating • Knowledge (a portal) of resources for creating data management plans for grant applications
  • 4. Prologue – Why ICPSR is Here • ICPSR has been in the data stewardship business for over 50 years – since 1962 • Center located within the Institute for Social Research at the University of Michigan • ICPSR exists to preserve and share research data to support researchers who: – Write research articles, books, and papers – Teach or utilize quantitative methods – Write grant/contract proposals (require data management plans) • Data stewardship = data curation = our purpose
  • 5. Two ‘Recent’ Moments in Federal Data Sharing History • NSF: January 2011 – requirement of data management plans • OSTP: February 2013 – Memo with subject “Increasing Access to the Results of Federally Funded Scientific Research”
  • 6. The Statement Heard Round the Research World: • In January 2011, the National Science Foundation released a new requirement for proposal submissions regarding the management of data generated using NSF support. All proposals must now include a data management plan (DMP). (NIH has similar DMP requirements.) • The plan is to be short, no more than two pages, and is submitted as a supplementary document. The plan needs to address two main topics: – What data are generated by your research? – What is your plan for managing the data?
  • 7. The OSTP Memo • Released February 22, 2013 • This memo directed funding agencies with an annual R&D budget over $100 million to develop a public access plan for disseminating the results of their research
  • 8. The OSTP Memo – A Review • A concern for investment: “Policies that mobilize these publications and data for re-use through preservation and broader public access also maximize the impact and accountability of the Federal research investment.” • Federal agencies with over $100 M annually in R&D expenditures to develop plans to support increased public access to the results of research funded by the Federal Government
  • 9. The details are still developing but the focus for research data sharing includes: 1. Maximize public access (includes discoverability) 2. Protect confidentiality and privacy 3. Allow for inclusion of costs in proposals for federal funding of scientific research 4. Appropriate evaluation of submitted data plans 5. Compliance mechanisms 6. Cooperation with the private sector 7. Appropriate attribution 8. Long term preservation and sustainability
  • 10. What is good data sharing - the basis of data stewardship? The goals are simple: • Data gets used (maximizes taxpayer investment) • Available today and into the future • Research respondent protection
  • 11. Data Stewardship = Getting Data Used 1. Data must be discoverable a) Ability to discover data online requires proper tagging and exposure for search engine indexing  Concept of a ‘data catalog’ b) Data citation – data used in research articles should have a DOI and citation just like research articles 2. Data must be accessible a) On-demand (available for download) b) Well-documented (survey scope, sample population, questionnaire, study & data nuances, etc.) c) Available in usable/popular formats (SPSS, Stata, online analysis)
  • 12. Data Stewardship = Future Availability 1. Data in preservation format (ASCII) 2. File migration to current software versions 3. Well-documented (survey scope, sample population, questionnaire, study & data nuances, etc.) 4. Stored in an ever-present archive (location) – available today and XX+ years from today!
  • 13. Data Stewardship = Respondent Confidentiality • It is critically important to protect the identities of research subjects • Disclosure risk is a term that is often used for the possibility that a data record from a study could be linked to a specific person • Data with these risks can be shared: – Data anonymized for public access – Data distributed via secured virtual environment • Data concerning very sensitive topics can also be shared via a secured environment
  • 14. The Concept of Data Curation • Curation, from the Latin "to care," is the process used to add value to data, maximize access, and ensure long-term preservation • Data curation is akin to work performed by an art or museum curator. – Data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future • Curation provides meaningful and enduring access to data = Data Stewardship!
  • 15. The Status of Data Sharing • The Good News – Good data sharing exists! • The Bad News – Good data sharing requires funding - sustainable funding! – Sustainable funding for free public access remains a challenge
  • 16. Sustainable Data Sharing Models – Three to Explore • Fee for access model (subscription model) • Agency model (agency or foundation funds public access) • Fee for deposit model (researcher writes fee into grant and pays at deposit to fund public access)
  • 17. I. Fee-for-Access Data Sharing • Funding is maintained by annual subscription fees charged to institutions; individuals at subscribing institutions have free (open) access to data • Pooled (ongoing) subscriber fees are used to acquire, curate, and maintain the service • The service, open to everyone, is thus sustained by subscribers, but agencies indicate these models are not ‘open enough’ because of the access fees
  • 18. II. Agency-funded Data Sharing • Agency sponsors/funds (ongoing) data curation & sharing enabling the public to access without charge • The archive is hosted with a curation entity like ICPSR where the public can easily discover and access data and restricted-use data can also be securely shared • Agency directs data selection and compliance policies
  • 20. III. Fee-for-Deposit Data Sharing • Depositor (individual or entity) pays for data to be curated and stored – a fee at deposit • Deposit fees should be written into the grant application • Incoming deposit fees sustain the service and the professionals behind it • Sustainability risk fairly high in this model as it depends upon: – Continuous influx of deposit fees – Depositors to put allocated fees towards curation & sharing
  • 21. Fee for Deposit Services Arriving Daily! (tips for evaluating coming shortly)
  • 22. First: A Side-Note on Sharing Restricted-Use Data • Data with disclosure risk – potential to identify a research subject • Data with highly sensitive personal information What is Restricted-Use Data?
  • 23. Common Objection/Misperception: “My data are too sensitive to share. . .” • ICPSR has been sharing restricted-use data for over a decade via three methods: – Secure Download – Virtual Data Enclave – Physical Enclave • ICPSR stores & shares over 6,400 restricted- use datasets associated with over 2,000 ‘active’ restricted-use data contracts
  • 24. Reality: Restricted-use data can be effectively shared with the public • Through the use of a virtual data enclave where the data never leave the server • Where there is a process (and understanding!) to garner IRB approval from the requesting scientist’s university • Where there is a system, technology, data professionals, and collaboration space in place to disseminate (expensive to build!) • Because agencies do allow for an incremental charge to the data requestor to offset marginal costs
  • 25. Review of Public Data Sharing Services • Overview of public data sharing services we have reviewed – Some key strengths of each • Disclaimer: ICPSR has recently launched a public access service (hosted) – You’ll likely notice some bias when we talk about the strengths of openICPSR – And because we built the service, we know much more about it – Still, ICPSR’s public access service isn’t for everyone – more on that shortly
  • 27. openICPSR – currently in its beta launch
  • 28. How is openICPSR unique? openICPSR is a public data-sharing service: • Where the deposit is reviewed by professional data curators who are experts in developing metadata (tags) for the social and behavioral sciences • With an immediate distribution network of over 750 institutions looking for research data, that has powerful search tools, and a data catalog indexed by major search engines • Sustained by a respected organization with over 50 years of experience in reliably protecting research data • Prepared to accept and disseminate sensitive and/or restricted-use data in the public-access environment
  • 29. Why should openICPSR’s unique attributes matter to depositors? While openICPSR is a new data-sharing service, it is backed by ICPSR • Discoverable: Posting data online isn’t enough. To maximize usage, data must be easily discovered. ICPSR is an expert in tagging scientific data for discovery by potential users • Usage: ICPSR’s data catalog is searched by thousands of individuals keenly interested in downloading and analyzing data; the catalog is also indexed by search engines connecting still more potential analysts to the data • Sustainable for the long term: ICPSR has existed as a data archive for over 50 years; depositors need not worry that their data will suddenly disappear due to a loss, for example, of funding • Secure dissemination of sensitive data: ICPSR is prepared to accept restricted-use data as it has the infrastructure and working knowledge in place to store and disseminate it securely to the public
  • 30. What types of deposit packages does openICPSR offer? There are two openICPSR package types: 1. Self Deposit: Enables research scientists to deposit data & documentation on demand and provide immediate public access. Depositors receive a DOI and data citation upon publishing and a metadata review shortly after publishing. The cost is $600 per project. 2. Professional Curation: Enables a research scientist to tap all aspects of ICPSR’s curation services. The fee depends on the complexity of the data and the curation services desired. Scientists must call for a quote, preferably during the time the grant proposal (specifically the data management plan) is being prepared. It is important to emphasize that these fees should be written into the grant application!
  • 31. How will openICPSR disseminate sensitive data to the public? • The deposit of sensitive (restricted-use) data is similar to the deposit of non-sensitive data except that the depositor will indicate that the data should be for restricted-use only • Dissemination of sensitive data will be through ICPSR’s virtual data enclave; in this environment, data never leave the secure server and analysis takes place in the virtual space • Scientists desiring to access the data will need to apply for the data, secure IRB approval, and will pay an access fee • openICPSR already accepts sensitive (restricted-use); dissemination of sensitive data is expect to take place in late 2014
  • 32. A final note: openICPSR accepts research data from a wide array of disciplines/fields, but not all
  • 33. Tips for Evaluating a Data Sharing Service • How will the service sustain itself? Does it have a long term funding stream? • How will the service care for my data in the long term should the service fail? Is there a plan? A safety net? • Can the service quickly maximize discoverability of my data? Does it explain how it will do so? • Does the service have a network of interested researchers & students seeking data? Will my data get used? • Does the service have knowledge of international archiving standards? • Does the service provide a DOI, data citation, and version control should I need to update my files? • I have sensitive data to deposit. Does the service understand how to secure it upon intake and when sharing? Does it have experience in this area? Questions to consider when selecting a data sharing service:
  • 34. Resources for Creating Data Management Plans for Grant Applications
  • 35. ICPSR’s Data Management & Curation Site http://www.icpsr.umich.edu/datamanagement/
  • 36. Purpose of Data Management Plans • Data management plans describe how researchers will provide for long-term preservation of, and access to, scientific data in digital formats. • Data management plans provide opportunities for researchers to manage and curate their data more actively from project inception to completion.
  • 39. And still more guidelines after the project is awarded: • Guide emphasizes preparation for data sharing throughout the project • Available online and via download (pdf)
  • 40. Copies of these Slides & Use • Feel free to share it; present it; cite it! • Find copies of these slides on Slideshare.net – Several notes and additional links are found in the notes view
  • 41. Get More information • Visit ICPSR’s Data Management & Curation site: http://www.icpsr.umich.edu/datamanage ment/index.jsp • Contact us: – netmail@icpsr.umich.edu – (734) 647-2200 • More on Assuring Access to Scientific Data: white paper – “Sustaining Domain Repositories for Digital Data”