SlideShare a Scribd company logo
POLICIES FOR DATA SHARING, ACCESS AND REUSE MacKenzie Smith MIT, ARL, CC
NSF DMP GUIDELINES WANT Policies  for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements Policies  and provisions for re-use, re-distribution, and the production of derivatives RDAP Summit  ©2011, MacKenzie Smith
WHAT IS DRIVING THIS? Scientific progress requires  international, interdisciplinary interoperability,  including frictionless data integration at large-scales (e.g. the Web) Data interoperability  includes technical  issues (data integration, protocols) social  issues (scientific norms, credit mechanisms or lack thereof) legal  issues (incompatible laws and policies for data and databases) RDAP Summit  ©2011, MacKenzie Smith
DATA USE/REUSE/REDISTRIBUTION  Data use : Using research data for the current research purpose/activity to infer   new knowledge about the research subject. Data re-use : Using research data for a research purpose/activity  other than  that   for which it was intended. Howard, T., Darlington, M., Ball, A., Culley, S., McMahon, C., 2010. Understanding and Characterizing Engineering Research Data for its Better Management. Project Report. Bath, UK: University of Bath, ERIM Project Document.  erim2rep100420mjd10 RDAP Summit  ©2011, MacKenzie Smith
DATA USE/REUSE/REDISTRIBUTION  Data purposing : Making research data available and fit for the  current  research   activity. Data re-purposing : Making research data available and fit for a future   known   research activity Data re-use:  Managing research data such that it will be   available for a  future  unknown  research activity. RDAP Summit  ©2011, MacKenzie Smith
SUPPORTING DATA REUSE Future users unknown, potentially interdisciplinary You don’t know them and  they don’t know you (or what you/your discipline expects) Data documentation and policies need to be clear, not  require  contact or ad hoc negotiations  (what if you’ve moved or you’re dead?) RDAP Summit  ©2011, MacKenzie Smith
INTERNATIONAL COLLABORATIONS If I participate in a collaborative international research project, do I need to be concerned with data management policies established by institutions outside the United States? Yes . There may be cases where data management plans are affected by formal data protocols established by large international research consortia or set forth in formal science and technology agreements signed by the United States Government and foreign counterparts. Be sure to discuss this issue with your sponsored projects office (or equivalent) and your international research partner when first planning your collaboration. RDAP Summit  ©2011, MacKenzie Smith
DATA LICENSING  IN US US Gov data in the Public Domain explicit rights statement rare Factual data not copyrightable in the US creativity matters, ‘sweat of the brow’ does not not much legal precedent in science  generally not known by users EULAs in place for many data archives all different, varying practicality, hard to enforce RDAP Summit  ©2011, MacKenzie Smith
CREATIVE COMMONS Tools for data sharing towards Web-scale interoperability (e.g. Linked Open Data) CC0 or CC-By Public Domain mark Best practice for URI-based attribution  (e.g. to avoid attribution stacking) RDAP Summit  ©2011, MacKenzie Smith
CREATIVE COMMONS CC0 waives copyright and associated rights  (e.g. data rights)  where applicable Important for interoperability with legal jurisdictions that have sui generis data rights (e.g. Europe) CC-By-SA bad for interoperability CC-By with attribution  via URI (Aus and NZ examples) Attribution stacking RDAP Summit  ©2011, MacKenzie Smith
ISSUES Licenses  Attribution  Persistent IDs Provenance Metadata  Registries RDAP Summit  ©2011, MacKenzie Smith
WHAT DO RESEARCHERS WANT? SUPPLY SIDE CREDIT  CONTROL CONFIDENCE (in appropriate use of their data) and sometimes… IP but always…  FUNDING RDAP Summit  ©2011, MacKenzie Smith
WHAT DO RESEARCHERS WANT? DEMAND SIDE Easy reuse of their own data Easy discovery of and access to outside data Easier integration/interoperability of their own, other data (i.e. “re-purposing”) RDAP Summit  ©2011, MacKenzie Smith
HOW CAN RESEARCHERS ACHIEVE THAT? Standard copyright licenses or waivers Standards terms & conditions (EULA) …  via their institutional repository! Researchers want good advice, have zero interest in complex legal issues IRs can establish practices that help researchers achieve their goals with low effort RDAP Summit  ©2011, MacKenzie Smith
DMP BOILERPLATE Sharing .  Project data will be made publicly accessible/downloadable from the  university’s data archive website  (via a standard Web UI) as … Once located on the archive website, image sets will be downloadable via standard Web protocols (i.e. http).  Included in the associated metadata for each image set will be rights information such as copyright and licensing terms for use and reuse of the data . Each image set will be assigned a unique, persistent URI (web identifier, resolvable as a URL) for use in citations. The university’s data archive uses Handles for persistent URIs. RDAP Summit  ©2011, MacKenzie Smith
DMP BOILERPLATE Licensing .  Images, even scientific research images generated by scanners, may be subject to copyright in the U.S., so images produced by the project will be collected and  shared using a Creative Commons license, specifically CC-BY (i.e. with attribution to the copyright owner,  who is the Principal Investigator for this project, with the approval of the university’s IP counsel) . By using the CC-BY license, we are authorizing all interested researchers to use the image data produced by this project in whatever manner they choose, as long as they cite the Principal Investigator as the source of the data.  RDAP Summit  ©2011, MacKenzie Smith
DMP BOILERPLATE Licensing, cont. Metadata  associated with the image sets will be released under a CC0 license (public domain dedication) since it is normally not copyrightable and we want it to be reusable in new contexts (e.g. Google indexes). With these licensing terms, future researchers will be able to combine the image data and associated metadata produced by this project with data produced from their own or other projects, to create super- or sub-sets of images needed for their own research (i.e. “derivative” datasets).   RDAP Summit  ©2011, MacKenzie Smith
DMP BOILERPLATE In the university’s central data archive,  researchers will be able to determine the rights assigned to the project’s data via the metadata displayed in the UI for the dataset (i.e. in the rights fields of the relevant catalog record for the dataset) .  The archive’s search interface supports filtering searches by rights category (e.g. Public Domain, CC-BY, embargoed) so that researchers can search for only data that they may reuse in their own research. RDAP Summit  ©2011, MacKenzie Smith
CONCLUSION IRs serving as data archives can  Standardize institutional data policies Encourage OA Lower barriers to researchers to comply with NSF intent DMPs encourage use of IR over time, reassure NSF of consistent practice RDAP Summit  ©2011, MacKenzie Smith

More Related Content

Smith RDAP11 NSF Data Management Plan Case Studies

  • 1. POLICIES FOR DATA SHARING, ACCESS AND REUSE MacKenzie Smith MIT, ARL, CC
  • 2. NSF DMP GUIDELINES WANT Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements Policies and provisions for re-use, re-distribution, and the production of derivatives RDAP Summit ©2011, MacKenzie Smith
  • 3. WHAT IS DRIVING THIS? Scientific progress requires international, interdisciplinary interoperability, including frictionless data integration at large-scales (e.g. the Web) Data interoperability includes technical issues (data integration, protocols) social issues (scientific norms, credit mechanisms or lack thereof) legal issues (incompatible laws and policies for data and databases) RDAP Summit ©2011, MacKenzie Smith
  • 4. DATA USE/REUSE/REDISTRIBUTION Data use : Using research data for the current research purpose/activity to infer new knowledge about the research subject. Data re-use : Using research data for a research purpose/activity other than that for which it was intended. Howard, T., Darlington, M., Ball, A., Culley, S., McMahon, C., 2010. Understanding and Characterizing Engineering Research Data for its Better Management. Project Report. Bath, UK: University of Bath, ERIM Project Document. erim2rep100420mjd10 RDAP Summit ©2011, MacKenzie Smith
  • 5. DATA USE/REUSE/REDISTRIBUTION Data purposing : Making research data available and fit for the current research activity. Data re-purposing : Making research data available and fit for a future known research activity Data re-use: Managing research data such that it will be available for a future unknown research activity. RDAP Summit ©2011, MacKenzie Smith
  • 6. SUPPORTING DATA REUSE Future users unknown, potentially interdisciplinary You don’t know them and they don’t know you (or what you/your discipline expects) Data documentation and policies need to be clear, not require contact or ad hoc negotiations (what if you’ve moved or you’re dead?) RDAP Summit ©2011, MacKenzie Smith
  • 7. INTERNATIONAL COLLABORATIONS If I participate in a collaborative international research project, do I need to be concerned with data management policies established by institutions outside the United States? Yes . There may be cases where data management plans are affected by formal data protocols established by large international research consortia or set forth in formal science and technology agreements signed by the United States Government and foreign counterparts. Be sure to discuss this issue with your sponsored projects office (or equivalent) and your international research partner when first planning your collaboration. RDAP Summit ©2011, MacKenzie Smith
  • 8. DATA LICENSING IN US US Gov data in the Public Domain explicit rights statement rare Factual data not copyrightable in the US creativity matters, ‘sweat of the brow’ does not not much legal precedent in science generally not known by users EULAs in place for many data archives all different, varying practicality, hard to enforce RDAP Summit ©2011, MacKenzie Smith
  • 9. CREATIVE COMMONS Tools for data sharing towards Web-scale interoperability (e.g. Linked Open Data) CC0 or CC-By Public Domain mark Best practice for URI-based attribution (e.g. to avoid attribution stacking) RDAP Summit ©2011, MacKenzie Smith
  • 10. CREATIVE COMMONS CC0 waives copyright and associated rights (e.g. data rights) where applicable Important for interoperability with legal jurisdictions that have sui generis data rights (e.g. Europe) CC-By-SA bad for interoperability CC-By with attribution via URI (Aus and NZ examples) Attribution stacking RDAP Summit ©2011, MacKenzie Smith
  • 11. ISSUES Licenses Attribution Persistent IDs Provenance Metadata Registries RDAP Summit ©2011, MacKenzie Smith
  • 12. WHAT DO RESEARCHERS WANT? SUPPLY SIDE CREDIT CONTROL CONFIDENCE (in appropriate use of their data) and sometimes… IP but always… FUNDING RDAP Summit ©2011, MacKenzie Smith
  • 13. WHAT DO RESEARCHERS WANT? DEMAND SIDE Easy reuse of their own data Easy discovery of and access to outside data Easier integration/interoperability of their own, other data (i.e. “re-purposing”) RDAP Summit ©2011, MacKenzie Smith
  • 14. HOW CAN RESEARCHERS ACHIEVE THAT? Standard copyright licenses or waivers Standards terms & conditions (EULA) … via their institutional repository! Researchers want good advice, have zero interest in complex legal issues IRs can establish practices that help researchers achieve their goals with low effort RDAP Summit ©2011, MacKenzie Smith
  • 15. DMP BOILERPLATE Sharing . Project data will be made publicly accessible/downloadable from the university’s data archive website (via a standard Web UI) as … Once located on the archive website, image sets will be downloadable via standard Web protocols (i.e. http). Included in the associated metadata for each image set will be rights information such as copyright and licensing terms for use and reuse of the data . Each image set will be assigned a unique, persistent URI (web identifier, resolvable as a URL) for use in citations. The university’s data archive uses Handles for persistent URIs. RDAP Summit ©2011, MacKenzie Smith
  • 16. DMP BOILERPLATE Licensing . Images, even scientific research images generated by scanners, may be subject to copyright in the U.S., so images produced by the project will be collected and shared using a Creative Commons license, specifically CC-BY (i.e. with attribution to the copyright owner, who is the Principal Investigator for this project, with the approval of the university’s IP counsel) . By using the CC-BY license, we are authorizing all interested researchers to use the image data produced by this project in whatever manner they choose, as long as they cite the Principal Investigator as the source of the data. RDAP Summit ©2011, MacKenzie Smith
  • 17. DMP BOILERPLATE Licensing, cont. Metadata associated with the image sets will be released under a CC0 license (public domain dedication) since it is normally not copyrightable and we want it to be reusable in new contexts (e.g. Google indexes). With these licensing terms, future researchers will be able to combine the image data and associated metadata produced by this project with data produced from their own or other projects, to create super- or sub-sets of images needed for their own research (i.e. “derivative” datasets).   RDAP Summit ©2011, MacKenzie Smith
  • 18. DMP BOILERPLATE In the university’s central data archive, researchers will be able to determine the rights assigned to the project’s data via the metadata displayed in the UI for the dataset (i.e. in the rights fields of the relevant catalog record for the dataset) . The archive’s search interface supports filtering searches by rights category (e.g. Public Domain, CC-BY, embargoed) so that researchers can search for only data that they may reuse in their own research. RDAP Summit ©2011, MacKenzie Smith
  • 19. CONCLUSION IRs serving as data archives can Standardize institutional data policies Encourage OA Lower barriers to researchers to comply with NSF intent DMPs encourage use of IR over time, reassure NSF of consistent practice RDAP Summit ©2011, MacKenzie Smith