SlideShare a Scribd company logo
Introduction to ADA: The
Australian Data Archive
as a Trusted Repository
for Research Data
Dr. Steve McEachern
Director, ADA
2017 Research Support Community Day
Colombo Theatre, UNSW
13 February, 2017
ADA in Brief
• The Social Science Data Archive (now ADA) was set up
in 1981, housed in the Research School of Social
Sciences, with a mission to collect and preserve
Australian social science data on behalf of the social
science research community
• The Archive holds over 5000 datasets from around
1500 studies, including national election studies; public
opinion polls; social attitudes surveys, censuses,
aggregate statistics, administrative data and many
other sources.
• Data holdings are sourced from academic, government
and private sectors.
So what is a data archive?
• ‘A “trusted system” that provides... an accessible and
comprehensive service empowering researchers to
locate, request, retrieve and use data resources
in a simple, seamless and cost effective way,
while at the same time protecting the privacy,
confidentiality and intellectual property rights of
those involved.’
Social Sciences and Humanities Research Council of Canada. “National Data Archive Consultation Final
Report: Building Infrastructure for Access to and Preservation of Research Data in Canada” URL:
http://www.sshrc.ca/web/whatsnew/initiatives/da_finalreport_e.pdf [20 November 2003].
ADA Subarchives
• Social Science – predominantly survey or polling based
quantitative social science data
• Historical – an archive of Australian census data tables
from 1834 to the present day
• Indigenous – A thematic archive bringing together
research data about Aboriginal and Torres Strait Islanders
• Longitudinal –major longitudinal cohort and panel surveys
of the Australian population
• Qualitative – a new collection which provides specialist
data archiving and access services to qualitative
researchers
• Crime & Justice – major collections of data in crime, law
and justice, including criminal justice administrative data
• International – a central point of access for links to
international data sources around the world
ADA Data Holdings
 Ageing
 Business and management
 Census data
 Culture
 Demography
 Drugs, alcohol and tobacco
 Economics
 Education, employment and
work
 Environment, Conservation,
Land use
 Family studies
 Foreign affairs
 Gambling
 Health
 Housing
 Law, Crime, Courts
 Mass media, communication
and language
 Migration, immigration and
multiculturalism
 Politics and elections
 Public opinion and social
attitudes
 Psychology
 Quality of life
 Science, Technology
 Social welfare
 Sociology
 Tourism, recreation and
leisure
 Travel and transport
ADA data holdings cover a wide variety of subject areas:
Example studies
• Australian Survey of Social Attitudes (ANU, UWA, UQ, …)
• Longitudinal Surveys of Australian Youth (NCVER)
• Australian Election Studies (ANU, QUT)
• ANUPolls, Morgan Gallup Polls, Age Polls, Lowy polls
(1947 – Present)
• Colonial census tables and images, 1838-1901 (ABS)
• Census tabulations, 1966 – Present (ABS)
• National Drug Strategy Household Survey, 1994 –
Present (AIHW)
• Australian Workplace Relations Survey, 1990, 1995, 2014
(forthcoming) - Dept of Employment
• Negotiating the Life Course (ANU, AIFS, UQ)
Forthcoming
• Longitudinal studies
– Department of Social Services
• HILDA, LSAC, LSIC, BNLA
– National Centre for Vocational Education Research
• LSAY new wave
– Department of Health
• Australian Longitudinal Studies on Womens Health (ALSWH)
and Mens Health (Ten to Men)
– Bruce: Child Support study
• Exercise, Recreation and Sport Survey 2001-2010
(Australian Sports Commission)
• Giving Australia survey (DSS)
The ADA website
The ADA Study Page
Dataset study pages
Study information is based on the DDI-C (Data
Documentation Initiative) standard, and includes:
• Study: information including the investigators, abstract,
sample, data collection methods, and access
requirements.
• Variables: a list of variables available in a quantitative
dataset
• Related Materials: additional documentation (reports,
questionnaires, technical information), links and other
related studies (eg. others in the series) that may interest
you
Who uses ADA?
• 2016
– 12000 online analyses (usually crosstabulations)
– 1100 data file downloads
• Registrations:
– Approx. 1000 new users each year
• User types:
– Undergraduates: 41% of analysis, 16% of downloads
– Postgraduates: 33% / 40%
– Researchers:11% / 40%
– Others (media, government, NGO, etc.): 15% / 4%
• Institution types: (approx.)
– Australian universities: 70%
– International universities: 15%
– Government departments and agencies: 10%
– Other: 5%
Data dissemination options
The ADA study page
Study information is available through the tabs at the top of the
study:
• Study: information including the investigators, abstract,
sample, data collection methods, and access requirements.
• Variables: a list of variables available in a quantitative dataset
• Related Materials: additional documentation, links and other
related studies (eg. others in the series) that may interest you
The study page is also the access point for the ADA Nesstar
system, for:
• Analysis of quantitative data online,
• Download of data to your own computer.
Note: you will need to log in to your ADA user account in order to
access the Nesstar system.
Types of access
• Browse (viewing metadata):
– Open access
• Analyse (Online analysis): free user registration
– General access studies: Free access for registered users
– Restricted studies: User still requires approval to access
• Data download:
– For unrestricted data: submit a user request, and sign ADA
general user undertaking (reviewed by ADA staff)
– For restricted data: restricted access request form and specific
user undertaking (reviewed by ADA and depositor of data)
– Special access: depends on the particular access requirements
Browsing: The ADA Study Page
Exploring data in Nesstar
• The information about the study (from the ADA study
page) is also available in Nesstar. Click on the
Dataset icon to explore the study.
• For quantitative analysis, you can also view basic
statistics and charts for individual variables in this
section, by exploring the Variables tab
Exploring variables in Nesstar
Creating a cross-tabulation
Downloading data
• Nesstar is also used as the ADA data download system, to
export the data files for the study to your own computer.
• To download data, you need to have been approved for
download access for the study you are interested in.
• This can be done by submitting a Request for Data Access:
– a) from the “Request Analysis and Download access” link from a
study page, OR
– b) from your personal User page (http://users.ada.edu.au)
• This request then goes to the ADA User Services team for
approval.
• Once your download access has been approved, you will
receive an email notification from ADA, and a link to the study
will be added to your User Page.
Managing and Depositing Data:
ADA and DDI
Data deposit: ADAPT
Archival processing
Manual system with some automation tools
1. Deposit:
– Review of ADAPT submission
– Storage via ADAPT to file store
2. Data processing:
– File format conversion (usually to SPSS for processing)
– Privacy/confidentiality review
– Data cleaning (in consultation with depositor)
3. Metadata processing:
– DDI-C metadata creation in Nesstar Publisher
4. Publishing:
– Archival storage and access format creation
– Data publication to Nesstar server
– Metadata publication to Nesstar and ADA CMS
Future directions
Future trends
• Mandated rather than recommended data archiving
– How do we scale?
– Looking at self-deposit systems
• Open access to data as the default
– Government: PM&C Open Data Policy, data.gov(.au/.uk)
– Research: Horizon2020, ESRC, NSF, ARC/NHMRC??
• Broader range of data types available
– Qualitative data: YES
– Social media data:
• Raw feed (firehose): NO
• Processed data: ??? (how to support access)
– Administrative data: ???
• Broader range of users of that data
– Different disciplines: health, environment, comp. sci.
– Different users: public/media/government
– Different geographies: internationally
Core needs for social
science data
• Collection
• Preservation
• Integration
• Analysis
• Dissemination
ADA trusted digital
repository project
• Funded by ANDS 2016-17
• Aims:
– Completion of the Data Seal of Approval self-assessment
and certification process
• http://www.datasealofapproval.org/en/
• 16 requirements:
• Assessment on 0-4 scale:
• All requirements must be at least a 1
– Implemention of improvements to ADA systems and
procedures to improve certification assessment
– Review of the DSA certification process and criteria to
assess suitability for the Australian research data
environment
DSA requirements
• “Fundamental to the following guidelines are five
criteria, that together determine whether or not the
digital research data may be qualified as sustainably
archived:
– The research data can be found on the Internet.
– The research data are accessible, while taking into account
relevant legislation with regard to personal information and
intellectual property of the data.
– The research data are available in a usable format.
– The research data are reliable.
– The research data can be referred to.”
• http://www.datasealofapproval.org/media/filer_public/
2013/09/27/dsa-booklet_1_june2010.pdf
The guidelines
• “The associated guidelines relate to the implementation of
these criteria and focus on three stakeholders: the data
producer, the data repository and the data consumer.
1. The data producer is responsible for the quality of the digital
research data.
2. The data repository is responsible for the quality of storage
and availability of the data and data management.
3. The data consumer is responsible for the quality of use of the
digital research data.”
– http://www.datasealofapproval.org/media/filer_public/2013/09/2
7/dsa-booklet_1_june2010.pdf
• Guidelines:
https://drive.google.com/file/d/0B4qnUFYMgSc-
eDRSTE53bDUwd28/view
Repositories and archives
project
• With UNSW Library (Maude Frances)
• Exploring mechanisms for deposit and preservation
of data through repository to the data archive
• Questions we are exploring:
– Where should we deposit the data?
– Who should store the data?
– What metadata should we collect?
– Who should manage the metadata?
– How to transfer content (data and metadata) between
repository and archive?
– How to determine the “source of truth”? (e.g. who should
mint the DOI?)
ADA Dataverse
• Redevelopment of our database and website
infrastructure
– New website
– New data catalogue
• New functionality:
– Self-deposit of data
– Open data access
– API access (both for deposit and access, e.g. through R)
– Shibboleth authentication
• Currently in early testing
– For completion in 2017 (probably Q3)
• Functionality intended to support additional DSA
requirements
ADA Dataverse
Questions?
Steven McEachern
steven.mceachern@anu.edu.au
ada@anu.edu.au
http://ada.edu.au
Ada slide presentation rsc day_feb2017_v2
Ada slide presentation rsc day_feb2017_v2
Data documentation standards
DDI-Codebook
• Two flavours of DDI – Codebook and Lifecycle
• Focus on DDI-C, four sections:
1. Document description: characteristics of the DDI XML
document itself
2. Study description: characteristics of the Study (project) that
the DDI is describing (including Related Materials:
documents associated with the project, such as
questionnaires, codebooks, etc.)
3. File description: characteristics of the physical data files
4. Variable description: characteristics of the variables in the
data file
Dublin Core
• Type
• Format
• Identifier
• Source
• Language
• Relation
• Coverage
• Rights
• Title
• Creator
• Subject
• Description
• Publisher
• Contributor
• Date
DCAT (W3C)
DCAT standard is relatively simple, and includes four basic
objects:
• Dataset: “a collection of data, published or curated by a
single agent, and available for access or download in one
or more formats”
• Data catalog(ue): “ a curated collection of metadata about
datasets”
• Catalog(ue) record: “a record in a data catalog, describing
a single dataset”
• Distribution: “represents a specific available form of a
dataset”
• Key object for SRC is the Dataset
– others are distribution-related
ADA systems architecture
Approach
• Core archive website:
– http://www.ada.edu.au
• Sub-archives focussed on specialised thematic or
methodological areas
- eg. http://www.ada.edu.au/indigenous/home
• “Add-on” systems for complex analysis or
visualisation tasks:
– Nesstar
– GIS: http://gis-test.ada.edu.au
– Longitudinal visualisation: Panemalia
– Historical census data: http://hccda.ada.edu.au
OAIS architecture
Data sharing policies in Australia
Policy trends in data access
• Mandated rather than recommended data archiving
• Open access to data as the default (NSF, Office of
the President, data.gov(.au,.uk))
• Broader range of data types available
• Broader range of users of that data
Policy drivers
• Funders: Return on investment:
– Government data: Treasury, PM&C
– Research data: ARC/NHMRC, Horizon 2020
• Journal publishers: Reputation:
– Open access journals (e.g. PLOS One) and
– For-profit publishers (e.g. Nature, Science, Elsevier)
concerned about loss of credibility from fraudulent research
• Learned societies and disciplines: Good science
AND reputation:
– American Political Science Association: DART initiative
– American Economic Association:
Government data
• Australia: Australian Government Public Data Policy
Statement
– The Australian Government commits to optimise the use and
reuse of public data; to release non-sensitive data as open by
default; and to collaborate with the private and research sectors
to extend the value of public data for the benefit of the
Australian public.
– Public data includes all data collected by government entities for
any purposes including; government administration, research or
service delivery.
– Non-sensitive data is anonymised data that does not identify an
individual or breach privacy or security requirements.
– https://www.dpmc.gov.au/sites/default/files/publications/aust_go
vt_public_data_policy_statement_1.pdf
Research data
• Australian Code for the Responsible Conduct of
Research
• https://www.nhmrc.gov.au/guidelines-publications/r39
(Joint ARC/NHMRC publication)
• Section 2: Management of research data and
primary materials
• Then provides related links to ethics statements and
similar
ACRCR Section 2:
Responsibilities of Institutions
Section 2.1.1: In general, the minimum recommended period for
retention of research data is 5 years from the date of publication.
However, in any particular case, the period for which data should
be retained should be determined by the specific type of research.
For example:
• for short-term research projects that are for assessment
purposes only, such as research projects completed by
students, retaining research data for 12 months after the
completion of the project may be sufficient
• for most clinical trials, retaining research data for 15 years or
more may be necessary
• for areas such as gene therapy, research data must be retained
permanently (eg patient records)
• if the work has community or heritage value, research data
should be kept permanently at this stage, preferably within a
national collection.
ARC statement
"Researchers and institutions have an obligation to care
for and maintain research data in accordance with the
Australian Code for the Responsible Conduct of
Research (2007). The ARC considers data
management planning an important part of the
responsible conduct of research and strongly
encourages the depositing of data arising from a
Project in an appropriate publicly accessible subject
and/or institutional repository"
ANDS suggest three questions
1. Where will your research data be stored at
completion of the project?
2. What access will you provide to the data set on
completion of the project?
3. How will you enable others to reuse your
research data?
Horizon 2020
• http://ec.europa.eu/research/participants/docs/h2020
-funding-guide/cross-cutting-issues/open-access-
data-management/open-access_en.htm
• (All grants): Develop a data management plan (DMP)
within 6 months of commencement of project
• Pilot program (2014-17):
– Deposit research data described in DMP, preferably in a
research data repository
– As far as possible, projects must then take measures to
enable third parties to access, mine, exploit, reproduce and
disseminate (free of charge for any user) this research data.
– Guidelines recommend FAIR principles
FAIR principles
• Findable
• Accessible
• Interoperable
• Reusable
• Wilkinson, M. D. et al. The FAIR Guiding Principles
for scientific data management and stewardship. Sci.
Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).

More Related Content

Ada slide presentation rsc day_feb2017_v2

  • 1. Introduction to ADA: The Australian Data Archive as a Trusted Repository for Research Data Dr. Steve McEachern Director, ADA 2017 Research Support Community Day Colombo Theatre, UNSW 13 February, 2017
  • 2. ADA in Brief • The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences, with a mission to collect and preserve Australian social science data on behalf of the social science research community • The Archive holds over 5000 datasets from around 1500 studies, including national election studies; public opinion polls; social attitudes surveys, censuses, aggregate statistics, administrative data and many other sources. • Data holdings are sourced from academic, government and private sectors.
  • 3. So what is a data archive? • ‘A “trusted system” that provides... an accessible and comprehensive service empowering researchers to locate, request, retrieve and use data resources in a simple, seamless and cost effective way, while at the same time protecting the privacy, confidentiality and intellectual property rights of those involved.’ Social Sciences and Humanities Research Council of Canada. “National Data Archive Consultation Final Report: Building Infrastructure for Access to and Preservation of Research Data in Canada” URL: http://www.sshrc.ca/web/whatsnew/initiatives/da_finalreport_e.pdf [20 November 2003].
  • 4. ADA Subarchives • Social Science – predominantly survey or polling based quantitative social science data • Historical – an archive of Australian census data tables from 1834 to the present day • Indigenous – A thematic archive bringing together research data about Aboriginal and Torres Strait Islanders • Longitudinal –major longitudinal cohort and panel surveys of the Australian population • Qualitative – a new collection which provides specialist data archiving and access services to qualitative researchers • Crime & Justice – major collections of data in crime, law and justice, including criminal justice administrative data • International – a central point of access for links to international data sources around the world
  • 5. ADA Data Holdings  Ageing  Business and management  Census data  Culture  Demography  Drugs, alcohol and tobacco  Economics  Education, employment and work  Environment, Conservation, Land use  Family studies  Foreign affairs  Gambling  Health  Housing  Law, Crime, Courts  Mass media, communication and language  Migration, immigration and multiculturalism  Politics and elections  Public opinion and social attitudes  Psychology  Quality of life  Science, Technology  Social welfare  Sociology  Tourism, recreation and leisure  Travel and transport ADA data holdings cover a wide variety of subject areas:
  • 6. Example studies • Australian Survey of Social Attitudes (ANU, UWA, UQ, …) • Longitudinal Surveys of Australian Youth (NCVER) • Australian Election Studies (ANU, QUT) • ANUPolls, Morgan Gallup Polls, Age Polls, Lowy polls (1947 – Present) • Colonial census tables and images, 1838-1901 (ABS) • Census tabulations, 1966 – Present (ABS) • National Drug Strategy Household Survey, 1994 – Present (AIHW) • Australian Workplace Relations Survey, 1990, 1995, 2014 (forthcoming) - Dept of Employment • Negotiating the Life Course (ANU, AIFS, UQ)
  • 7. Forthcoming • Longitudinal studies – Department of Social Services • HILDA, LSAC, LSIC, BNLA – National Centre for Vocational Education Research • LSAY new wave – Department of Health • Australian Longitudinal Studies on Womens Health (ALSWH) and Mens Health (Ten to Men) – Bruce: Child Support study • Exercise, Recreation and Sport Survey 2001-2010 (Australian Sports Commission) • Giving Australia survey (DSS)
  • 10. Dataset study pages Study information is based on the DDI-C (Data Documentation Initiative) standard, and includes: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation (reports, questionnaires, technical information), links and other related studies (eg. others in the series) that may interest you
  • 11. Who uses ADA? • 2016 – 12000 online analyses (usually crosstabulations) – 1100 data file downloads • Registrations: – Approx. 1000 new users each year • User types: – Undergraduates: 41% of analysis, 16% of downloads – Postgraduates: 33% / 40% – Researchers:11% / 40% – Others (media, government, NGO, etc.): 15% / 4% • Institution types: (approx.) – Australian universities: 70% – International universities: 15% – Government departments and agencies: 10% – Other: 5%
  • 13. The ADA study page Study information is available through the tabs at the top of the study: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation, links and other related studies (eg. others in the series) that may interest you The study page is also the access point for the ADA Nesstar system, for: • Analysis of quantitative data online, • Download of data to your own computer. Note: you will need to log in to your ADA user account in order to access the Nesstar system.
  • 14. Types of access • Browse (viewing metadata): – Open access • Analyse (Online analysis): free user registration – General access studies: Free access for registered users – Restricted studies: User still requires approval to access • Data download: – For unrestricted data: submit a user request, and sign ADA general user undertaking (reviewed by ADA staff) – For restricted data: restricted access request form and specific user undertaking (reviewed by ADA and depositor of data) – Special access: depends on the particular access requirements
  • 15. Browsing: The ADA Study Page
  • 16. Exploring data in Nesstar • The information about the study (from the ADA study page) is also available in Nesstar. Click on the Dataset icon to explore the study. • For quantitative analysis, you can also view basic statistics and charts for individual variables in this section, by exploring the Variables tab
  • 19. Downloading data • Nesstar is also used as the ADA data download system, to export the data files for the study to your own computer. • To download data, you need to have been approved for download access for the study you are interested in. • This can be done by submitting a Request for Data Access: – a) from the “Request Analysis and Download access” link from a study page, OR – b) from your personal User page (http://users.ada.edu.au) • This request then goes to the ADA User Services team for approval. • Once your download access has been approved, you will receive an email notification from ADA, and a link to the study will be added to your User Page.
  • 20. Managing and Depositing Data: ADA and DDI
  • 22. Archival processing Manual system with some automation tools 1. Deposit: – Review of ADAPT submission – Storage via ADAPT to file store 2. Data processing: – File format conversion (usually to SPSS for processing) – Privacy/confidentiality review – Data cleaning (in consultation with depositor) 3. Metadata processing: – DDI-C metadata creation in Nesstar Publisher 4. Publishing: – Archival storage and access format creation – Data publication to Nesstar server – Metadata publication to Nesstar and ADA CMS
  • 24. Future trends • Mandated rather than recommended data archiving – How do we scale? – Looking at self-deposit systems • Open access to data as the default – Government: PM&C Open Data Policy, data.gov(.au/.uk) – Research: Horizon2020, ESRC, NSF, ARC/NHMRC?? • Broader range of data types available – Qualitative data: YES – Social media data: • Raw feed (firehose): NO • Processed data: ??? (how to support access) – Administrative data: ??? • Broader range of users of that data – Different disciplines: health, environment, comp. sci. – Different users: public/media/government – Different geographies: internationally
  • 25. Core needs for social science data • Collection • Preservation • Integration • Analysis • Dissemination
  • 26. ADA trusted digital repository project • Funded by ANDS 2016-17 • Aims: – Completion of the Data Seal of Approval self-assessment and certification process • http://www.datasealofapproval.org/en/ • 16 requirements: • Assessment on 0-4 scale: • All requirements must be at least a 1 – Implemention of improvements to ADA systems and procedures to improve certification assessment – Review of the DSA certification process and criteria to assess suitability for the Australian research data environment
  • 27. DSA requirements • “Fundamental to the following guidelines are five criteria, that together determine whether or not the digital research data may be qualified as sustainably archived: – The research data can be found on the Internet. – The research data are accessible, while taking into account relevant legislation with regard to personal information and intellectual property of the data. – The research data are available in a usable format. – The research data are reliable. – The research data can be referred to.” • http://www.datasealofapproval.org/media/filer_public/ 2013/09/27/dsa-booklet_1_june2010.pdf
  • 28. The guidelines • “The associated guidelines relate to the implementation of these criteria and focus on three stakeholders: the data producer, the data repository and the data consumer. 1. The data producer is responsible for the quality of the digital research data. 2. The data repository is responsible for the quality of storage and availability of the data and data management. 3. The data consumer is responsible for the quality of use of the digital research data.” – http://www.datasealofapproval.org/media/filer_public/2013/09/2 7/dsa-booklet_1_june2010.pdf • Guidelines: https://drive.google.com/file/d/0B4qnUFYMgSc- eDRSTE53bDUwd28/view
  • 29. Repositories and archives project • With UNSW Library (Maude Frances) • Exploring mechanisms for deposit and preservation of data through repository to the data archive • Questions we are exploring: – Where should we deposit the data? – Who should store the data? – What metadata should we collect? – Who should manage the metadata? – How to transfer content (data and metadata) between repository and archive? – How to determine the “source of truth”? (e.g. who should mint the DOI?)
  • 30. ADA Dataverse • Redevelopment of our database and website infrastructure – New website – New data catalogue • New functionality: – Self-deposit of data – Open data access – API access (both for deposit and access, e.g. through R) – Shibboleth authentication • Currently in early testing – For completion in 2017 (probably Q3) • Functionality intended to support additional DSA requirements
  • 36. DDI-Codebook • Two flavours of DDI – Codebook and Lifecycle • Focus on DDI-C, four sections: 1. Document description: characteristics of the DDI XML document itself 2. Study description: characteristics of the Study (project) that the DDI is describing (including Related Materials: documents associated with the project, such as questionnaires, codebooks, etc.) 3. File description: characteristics of the physical data files 4. Variable description: characteristics of the variables in the data file
  • 37. Dublin Core • Type • Format • Identifier • Source • Language • Relation • Coverage • Rights • Title • Creator • Subject • Description • Publisher • Contributor • Date
  • 38. DCAT (W3C) DCAT standard is relatively simple, and includes four basic objects: • Dataset: “a collection of data, published or curated by a single agent, and available for access or download in one or more formats” • Data catalog(ue): “ a curated collection of metadata about datasets” • Catalog(ue) record: “a record in a data catalog, describing a single dataset” • Distribution: “represents a specific available form of a dataset” • Key object for SRC is the Dataset – others are distribution-related
  • 40. Approach • Core archive website: – http://www.ada.edu.au • Sub-archives focussed on specialised thematic or methodological areas - eg. http://www.ada.edu.au/indigenous/home • “Add-on” systems for complex analysis or visualisation tasks: – Nesstar – GIS: http://gis-test.ada.edu.au – Longitudinal visualisation: Panemalia – Historical census data: http://hccda.ada.edu.au
  • 42. Data sharing policies in Australia
  • 43. Policy trends in data access • Mandated rather than recommended data archiving • Open access to data as the default (NSF, Office of the President, data.gov(.au,.uk)) • Broader range of data types available • Broader range of users of that data
  • 44. Policy drivers • Funders: Return on investment: – Government data: Treasury, PM&C – Research data: ARC/NHMRC, Horizon 2020 • Journal publishers: Reputation: – Open access journals (e.g. PLOS One) and – For-profit publishers (e.g. Nature, Science, Elsevier) concerned about loss of credibility from fraudulent research • Learned societies and disciplines: Good science AND reputation: – American Political Science Association: DART initiative – American Economic Association:
  • 45. Government data • Australia: Australian Government Public Data Policy Statement – The Australian Government commits to optimise the use and reuse of public data; to release non-sensitive data as open by default; and to collaborate with the private and research sectors to extend the value of public data for the benefit of the Australian public. – Public data includes all data collected by government entities for any purposes including; government administration, research or service delivery. – Non-sensitive data is anonymised data that does not identify an individual or breach privacy or security requirements. – https://www.dpmc.gov.au/sites/default/files/publications/aust_go vt_public_data_policy_statement_1.pdf
  • 46. Research data • Australian Code for the Responsible Conduct of Research • https://www.nhmrc.gov.au/guidelines-publications/r39 (Joint ARC/NHMRC publication) • Section 2: Management of research data and primary materials • Then provides related links to ethics statements and similar
  • 47. ACRCR Section 2: Responsibilities of Institutions Section 2.1.1: In general, the minimum recommended period for retention of research data is 5 years from the date of publication. However, in any particular case, the period for which data should be retained should be determined by the specific type of research. For example: • for short-term research projects that are for assessment purposes only, such as research projects completed by students, retaining research data for 12 months after the completion of the project may be sufficient • for most clinical trials, retaining research data for 15 years or more may be necessary • for areas such as gene therapy, research data must be retained permanently (eg patient records) • if the work has community or heritage value, research data should be kept permanently at this stage, preferably within a national collection.
  • 48. ARC statement "Researchers and institutions have an obligation to care for and maintain research data in accordance with the Australian Code for the Responsible Conduct of Research (2007). The ARC considers data management planning an important part of the responsible conduct of research and strongly encourages the depositing of data arising from a Project in an appropriate publicly accessible subject and/or institutional repository"
  • 49. ANDS suggest three questions 1. Where will your research data be stored at completion of the project? 2. What access will you provide to the data set on completion of the project? 3. How will you enable others to reuse your research data?
  • 50. Horizon 2020 • http://ec.europa.eu/research/participants/docs/h2020 -funding-guide/cross-cutting-issues/open-access- data-management/open-access_en.htm • (All grants): Develop a data management plan (DMP) within 6 months of commencement of project • Pilot program (2014-17): – Deposit research data described in DMP, preferably in a research data repository – As far as possible, projects must then take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. – Guidelines recommend FAIR principles
  • 51. FAIR principles • Findable • Accessible • Interoperable • Reusable • Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).