33

Many researchers have unpublished data. Some of this data may never be published as a manuscript. But I would like to make scholarly contributions of data that I have no intent on publishing, e.g. by publishing a "data paper"

The term "data paper" may be too new to be familiar, so here is a description from the Ecological Archives website:

Data Papers are compilations and syntheses of data sets and associated metadata deemed to be of significant interest to the ESA membership and the scholarly community. Data papers are peer reviewed and are announced in abstract form in the appropriate print journal as a Data Paper. Data papers differ from review or synthesis papers published in other ESA journals in that data papers normally will not test or refine ecological theory. Data Papers can facilitate the rapid advancement of ecological knowledge and theory at the same time that they disseminate information. In addition, Ecological Archives provides a reward mechanism (in the form of peer-reviewed, citable objects) for the substantial effort required to compile and adequately document large data sets of ecological interest

This brings up the following questions:

What makes a good data repository?

Which data repositories provide a doi: for raw data?

Should published data be separate from articles on a CV?

6
  • When you say data, do you mean rows and columns of numbers (which is the obvious assumption) or is it all DATA pertinent to research such as equations, figures?
    – dearN
    Commented Apr 3, 2012 at 4:51
  • @dna Perhaps data papers can include descriptive statistics but not more than that.
    – Abe
    Commented Apr 3, 2012 at 5:42
  • @david do you mean something like DataOne?
    – Abe
    Commented Apr 3, 2012 at 5:48
  • @David: You need to provide a more complete description of what you're looking to publish. Is it just raw unformatted data; is it post-processed data of the type that fits an existing archive? What resources do you have available to you at your institution? Is there something field-specific already available?
    – aeismail
    Commented Apr 3, 2012 at 8:31
  • 2
    It is sad that non-reproducible results can be published at all (happes a lot in computer science).
    – Raphael
    Commented Aug 16, 2012 at 9:42

5 Answers 5

20

There are a few things that I would consider when choosing a data repository:

  • Does it let you release your data under a license you're happy with?
    • Applying too restrictive a license can prevent anyone from doing anything useful with the data, so think about what you're prepared to allow. In particular, remember that most of the research done in academia could be considered "commercial" from a legal perspective. On the other hand, you may wish to choose a license that ensures you get credit for your work. You may or may not agree with them, but reading the Panton Principles will give you some idea of the issues here. Also take a look at this list of licenses written with data in mind
  • How easy will your data be to find?
    • People will only use your data if they can find it. I recommend Googling (other search engines are available) for some datasets you know of in your field and see if they come up — those repositories which are indexed by the major search engines will put you at a big advantage when it comes to attracting citations.
  • What repositories are well known in your field?
    • Your institution may have a repository which you can easily deposit in, but it won't be the first place colleagues in your field will think of to look. If there are well-established repositories I would prefer those, or make sure your data is indexed by a well-established aggregator (I know ANDS runs a national aggregator in Australia).
  • What does your institution allow?
    • In many cases, your institution will own (or otherwise have a claim to) the data you generate as part of your research, so check what your local policies are and if need be ask your supervisor, head of department, legal team, etc. This will particularly affect your choice of license.

The other parts of your question can probably be answered better by others here (or maybe it should be split into several?)

7

Figshare provides online hosting and a permalink to your dataset, though it does not provide a DOI. I've been posting some figures there, but not data, and I quite like the service. They allow the option of keeping the data private as well, so you can use to store the data and later release it when you're done.

1
3

I think the best place for data is in a subject-focused data repository, but in the absence of that, there are repositories such as Dryad.

Biomed Central just announced a partnership with a site called LabArchives to host data of BMC authors, including DOIs for the data and the re-use promoting CC0 license, but I don't have any experience with the site.

3

If you have a website with free preprints of your work (which you probably should have), put your data (and code) there. Alternatively, I know people who use GitHub (or similar) for the purpose of (distributed) storage. This has the charm of persistence and an immediate potential of collaboration.

For a (hopefully) persistent approach to citability, DataCite looks legit. In particular, they issue DOIs and are funded by libraries and research facilities from around the globe.

2

Sounds like it might be appropriate for Pangaea: http://www.pangaea.de/submit/

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .