4

Suppose I'm doing some experimental work in an academic context. The essence of the results will likely be captured by documents such as conference/journal articles, technical reports, a M.Sc. or Ph.D. thesis, a monograph etc. For these kinds of artifacts, we have all sorts of avenues for long-term archiving and Internet availability: Journals have their own archives, universities make internal publications available (well, sometimes), and you can put copies on arXiv and/or sites like ResearchGate or Academia.edu (although the latter have their problems).

If you've produced software code - you can again use university-specific facilities, or platforms like BitbBucket, GitHub or SourceForge which recently revamped itself into relevance.

Where would you put raw data, though? Especially tabular data? You don't publish the (potentially large amounts) of it alongside your papers. You could put it as a file on your website, but this is limited-availability archiving/publications - it's like putting a link to a software source archive on your website. It's there, but people are much less likely to find it than if it were a repository on one of the platforms mentioned above.

So, my question is: Are there platforms for storing public data, in particular data obtained during academic/scientific work?

Notes:

  • It doesn't quite matter if the data is accessible directly as though on an SQL database; or if you can browse it in a tabular fashion through some web interface. Those are nice options, but even something "primitive" as a CSV file in a standard-format URL is already passable.
  • Same goes for versioning or revision-control support: Nice to have, not a deal-breaker for a potential answer here.
  • Publications don't need to have a perma-link to the data, nor will it necessarily be archived before such publications. But again - it's a nice feature such a platform could have.
  • This question is highly related - almost a dupe - if you look at the title, but the body asks different questions than I do. I don't want/need the data to count as a "paper" or a CV-worthy publication; I don't need/want peer-reviewing of the data as a condition for it being available to the public etc.
3

2 Answers 2

2

Do you know about zenodo which is a CERN initiative? It may be interesting for you https://about.zenodo.org/

2
  • 2
    I don't. Please describe that for us in your answer... StackExchange policy calls for answers to be more than just named links.
    – einpoklum
    Commented Apr 25, 2020 at 21:47
  • I DOI data sets with zenodo and find it useful
    – user120011
    Commented Apr 26, 2020 at 18:02
0

One solution that seems to fit your criteria is the Open Science Framework

Here is a description of the project (source):

OSF is a free, open source web application that connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research. Researchers use OSF to collaborate, document, archive, share, and register research projects, materials, and data. OSF is the flagship product of the non-profit Center for Open Science.

Regarding the specific criteria that you mentioned:

  • "OSF has built-in version control for all files stored in your project, can render hundreds of different file types, and allows you to directly edit plain text files (including R and Python scripts) directly in the browser." (source)
  • "Each project and component can have its own set of files, allowing you to organize your files into categorial or hierarchical groups, like datasets or studies. Each file has a unique, persistent URL so that it can be cited or linked to individually." (source)
  • "OSF is maintained and developed by the Center for Open Science (COS), a 501(c)3 non-profit organization. COS is supported through grants from a variety of supporters, including federal agencies, private foundations, and commercial entities... COS established a $250,000 preservation fund for hosted data in the event that COS had to curtail or close its offices. If activated, the preservation fund will preserve and maintain read access to hosted data. This fund is sufficient for 50+ years of read access hosting at present costs. COS will incorporate growth of the preservation fund as part of its funding model as data storage scales..." (source)

Not the answer you're looking for? Browse other questions tagged .