45

I am writing a paper with my adviser. Some of the key contributions of our paper include:

  1. A computer simulation model of an inventory system
  2. A set of simulation experiments and results to validate the simulation model
  3. An implementation of an inventory policy which we propose
  4. Simulation results of our proposed inventory policy and inventory policies which are currently used in practice

We want to make our computer code freely available to others. The simulation model was written in the Java programming language so other researchers should be able to run the code on their own computers fairly easily. By sharing our computer code, we hope that other researchers will develop inventory policies which they can test using our simulation model. (Of course, it would be beneficial for us if they would do so and cite us in their paper!)

Question: What is a good way to go about sharing our computer code?

  1. Where do I host the code?
  2. Which software license should we "publish" the code under?
  3. How do I make it easy for other people to run the code?
6
  • 6
    On "choosing a software license": I invite all to take a look at tldrlegal.com
    – user7112
    Commented Feb 11, 2014 at 12:32
  • 1
    And screenshots or (even better) working examples on web (without any installation) do miracles. Commented Feb 11, 2014 at 15:14
  • 8
    CC licenses should not be used for software. This is unfortunate, as there's no similarly clear pick-your-license-rules for software. choosealicense.com or tldrlegal.com should help you choose.
    – Tim S.
    Commented Feb 11, 2014 at 19:34
  • 1
    @TimS. There's one exception: The CC0 license can be used for software. It's a simple permissive license.
    – TRiG
    Commented Feb 12, 2014 at 1:16
  • Not relevant to you, but IPOL is a journal in image processing that asks you to publish an implemented code along with the paper, in a way that enables anyone to test it live: check it out: ipol.im (go to any article to test its code in minutes). Commented Feb 12, 2014 at 19:23

8 Answers 8

39

Answer to 1: Where should I host my code?

Depending on what your University offers you, you could choose to host it with the University, or perhaps with an open-source repository such as Github, Bitbucket, SourceForge, or similar.

Many of these services have a "paid" subscription option for private repositories if those are required.

Answer to 2: What open-source license should I choose?

This question is relevant because we're having this discussion right now within one of our own research projects. I happen to know a little about open source software, having researched it in the past and having taught a few courses on it.

Though there are a lot of open-source licenses out there, they really end up coming in two main families. They're either permissive open licenses (ex: MIT, BSD, Apache) or they are Free (GNU Public License v2 or GPLv3). Here's a brief lowdown by the Open Source Initiative

Permissive open licenses These licenses generally allow you to release your code and anyone can do anything with them that they want as long as they retain certain copyright information with the code. In reality, this has a number of implications.

  1. Someone could take your entire code base, create a product with it, and sell it.

  2. Someone could take parts of your code, put it in their own project (commercial or not).

  3. Because the license is more permissive, you yourself could take the code, close it, and then keep under wraps any future releases so you can make money off of the code or hide it from the public.

  4. Because the license is more permissive, you might generate more interest as a result. People may take code from other projects and use it to improve yours. On the flip-side, they could also make improvements for your source code and never share them back with you.

On the flip-side, the GNU GPL is a Free Software License that disallows you from doing certain things. In that sense, it's more restrictive, but does so for a number of ideological reasons.

  1. If you release software under the GPL, you can't close-source it. Ever. It's going to remain in the open, and if someone asks you for the source code you are obligated by the terms of the license to provide it (if you host it on Github or another public repository, then you have already satisfied this requirement).

  2. A company could take the code and make products with it and sell it (it's their right to do so), but they would have to do so under the condition that any source code that they write for the project is also released under the GPL. Because of this, a lot of companies who make a lot of money writing software don't like this because they have to continually release code to the public. On the flip-side, any cool stuff that they do gets put into the public under the GPL, so you could fold it back into your project and improve it. They can't take your code, improve it, and then never share it again.

  3. If you happen to have used any GPL code in your project (let's say you took a few lines out of the Linux kernel or Git version control or whatever) then you'll have to release your code as GPL as well.

In the end, the choice of license affects more about how you want the software to be used (and the eventual community it might bring in). If you plan to commercialize the software, (and implicitly allow others to do the same), then you might want to lean BSD. If you don't want people to take your hard work and profit off of it without showing you the results, then you want to go GPL. If you don't care either way, then you could probably just choose one. I think BSD is popular in academia precisely because of the commercialization aspect (for example LLVM is gaining a lot of traction because of its permissive license).

Answer for 3: How do I make it easy for others to run the code?

You make it easy to run code by engineering it to be easy to run and by being extremely detailed with your documentation.

Packaging/distribution can actually be pretty hard and usually take more effort than most people would think. A good way to make the software easy to run is to test it on multiple machines. Make sure that you're not forgetting any of the libraries that you're using in your software project, for example, and when possible, try to use software libraries that are common and well-maintained. Use mainstream languages with easy-to-manage package repositories.

When appropriate, use installers, installer scripts, Makefiles (distutils, which uses automake/autoconf is better), etc. Even shell scripts are better than nothing. If you can provide binaries and/or an installer, that will make things even easier. The problem is that this is a LOT of work!

Accompany it with documentation. Ideally, the documentation will contain a description of how to set it up and run it, with descriptions of necessary packages/libraries, data that you might have to get, and what to type or click on. Usually, something called README or INSTALL will attract attention. Put the instructions on the web page as well, most of the hosting solutions also allow you to have web pages.

Hope this all helps. The hardest part of the process is by far Step #3 and most people don't get as far as to use good techniques like installers, automake/autoconf, and so forth because it's a LOT of work and development often moves faster than you can write documents. However, no one is grading you on your style so it's often easier to get it out than it is to clean it up and prettify it first.

9
  • 6
    Re your part about paid subscriptions: Bitbucket has a free plan for academics only that gives you unlimited free private repos. Commented Feb 11, 2014 at 6:40
  • 7
    Whatever license you choose, you can always use it yourself in any way you like, because you are the copyright holder. This means, you can use it in a closed source project, even if you have chosen GPL as the license. Commented Feb 11, 2014 at 13:09
  • 5
    Keep in mind that GPL is a non-exclusive license. So if you are the owner of the copyright (careful: if you're paid for your work, in many jurisdictions the employer owns the copyright!) you can in addition to the free & open GPLd version provide non-free, closed versions. I.e. GPLing your code still allows you to sell non-free/closed source licenses e.g. to a company who wants to use your code in their non-free/closed product. (Of course only if your doesn't include third party software with a license that prevents this)
    – cbeleites
    Commented Feb 11, 2014 at 14:02
  • 3
    If you release software under the GPL, you can't close-source it. Ever. It's going to remain in the open, and if someone asks you for the source code you are obligated by the terms of the license to provide it. This is false. If you release your own software under the GPL, you can relicense it at any time. Others are still allowed redistribute your GPL-published software and its source, but you have no obligations, and you can also decide that from a certain point onwards new versions are no longer GPL. Commented Apr 9, 2017 at 7:07
  • 1
    Under point 1, I would add a note about the importance of using a system that archives your code. For example, zenodo integrates with github to provide proper archiving of github releases. This is essential for academic work. guides.github.com/activities/citable-code Commented Apr 10, 2017 at 1:40
23

To some extent, the answer will depend on what you wish to accomplish with this release. There was a fantastic blog post recently on that precise topic.

If the code is of great shape, and you hope others will build on it, then choosing the licence is going to reflect your philosophy. A BSD style license if you just want the algorithm and code out there, or perhaps a Copyleft (GPL) style licence if you want to make sure improvements return to the commons.

If the code isn't in such great shape, but for transparency's sake needs to be out there, consider something along the lines of the CRAPL, which acknowledges the messy nature of modern computational sciences. I think the preamble is worth quoting:

I. Preamble

Science thrives on openness.

In modern science, it is often infeasible to replicate claims without
access to the software underlying those claims.

Let's all be honest: when scientists write code, aesthetics and
software engineering principles take a back seat to having running,
working code before a deadline.

So, let's release the ugly.  And, let's be proud of that.

As far as the actual mechanics of putting the code up, use GitHub or Bitbucket. These services are going to give you code hosting, a home for the project, the ability to manage contribution, and the ability to track bugs and issues.

5
  • 6
    As an addendum, both Github and Bitbucket offer academic accounts if you register with an institutional e-mail address. That way, you don't even have to mix up with your regular account if you already have one.
    – sansuiso
    Commented Feb 11, 2014 at 7:43
  • Something to maybe use in conjunction with github and bitbucket is readthedocs (readthedocs.org) It would allow for you to generate a nice site with code documentation, installation instructions etc... Very useful site.
    – cc7768
    Commented Feb 11, 2014 at 14:15
  • 1
    I would warn agains copyleft licenses (unless someone has a very good reason to go with it). It is de facto restrictive, as not every product can use it. Commented Feb 11, 2014 at 15:20
  • 1
    @PiotrMigdal: as long as it is a non-exclusive license (such as the GPL) and the original copyright is in one hand, copyleft is no problem: you (or probably rather: your employer/university) can license it in addition under different terms to anyone who needs special terms. It often becomes a problem once you have several owners of copyright for parts of the code. I'd put more emphasis on having a similar license to "surrounding" software (e.g. for an R package, copyleft is perfectly sensible even if that is not already required due to dependencies of copyleft packages)
    – cbeleites
    Commented Feb 11, 2014 at 22:50
  • 2
    @cbeleites It is non-exclusive, but as soon as other people start contributing (and it is one of goals of open licenses), to put another license on the new content, AFAIK, requires every contributor to agree. If project is successful to any degree, it effectively becomes impossible. I don't claim that people should not use GPL (certainly, there are use cases); just that its consequences are less trivial than a newbie in open software can imagine (most importantly, that copyleft is not the most permissive). Commented Feb 12, 2014 at 10:01
13

Matthew G. and Irwin have given great answers, but I'd like to provide some additional resources and references for those interested.

First, take a look at answers to this similar question on scicomp.SE:

What material should I include with a journal article (or post online) in order to make my computational research reproducible?

Reproducibility was the subject of a 2012 workshop at ICERM; you'll find a lot of useful material on the wiki and in the final report (see especially appendices D, E, and F).

Archival/hosting

Update:You can get a DOI and permanent hosting for a snapshot of your code via Figshare or Zenodo.

Licensing

See this section of the wiki for an extensive list of resources.

Making it easy to run the code

There are some sites and tools out there aimed specifically at this. These also solve the hosting issue:

  • ActivePapers: An ActivePaper is a single file containing all the software and datasets related to a research project.
  • RunMyCode: This service is based on the innovative concept of a companion website associated with a scientific publication.

A major hurdle is often re-creating the correct environment (including libraries and such) necessary to run the code. To overcome this, you could

It can be useful to put your code in a worksheet format, where you can intersperse comments and even mathematical formulas (for instance, using the IPython notebook or a Sage worksheet. Here is an example.

Examples

Finally, here are some examples of my own efforts. They're far from perfect, but may still be helpful.

1
  • +1 for mentioning virtual machines. I think that this is a big deal for a number of disciplines where reproduction has historically be difficult.
    – Matthew G.
    Commented Feb 11, 2014 at 23:09
9

I recommend github.

The other answers given are well detailed and include it but given a bunch of other choices. Choice is obviously good. Without specific advantages listed though I am suggesting you just go with github.

My rationale is that I feel that github has become a clear leader in the field of code storing and sharing. It's underlying technology of git as a modern dvcs system has largely replaced older technologies such as svn. It now has over 2.8 million users which is quite impressive.

Github is also great for code collaboration, allowing for multiple people to edit and merge their changes in in a controlled but decentralized fashion.

Github allows you to have both public (anyone can view) as well as private repositories that you control view access to. For updating, you add the requested users ssh keys to grant update access.

6

All the answers above are great. I would just like to add that, if you plan to publish it on your lab's website or any personal website, you should also copy it somewhere else.

In many fields, it appears that data (including original programs) is disappearing all the time. When a lab moves to another University, closes, or undergoes any kind of restructuration, its website is likely to change, and data stored there can be lost. So, unless your University has a centralised repository, you should put your code where it can stay for decades, for instance on Github.

4

In my opinion, It is related to the specific code you want to share.

I just want to give an example, for JavaScript code you can share it on http://jsfiddle.net/ We can test our JavaScript, CSS, HTML or CoffeeScript online in the web. There is also an option for inviting people to collaborate in developing our code.

Good luck.

1
  • 2
    Nice point; welcome to academia.SE! Commented Feb 11, 2014 at 0:39
2

For my graduation paper I shared my code by printing it all out as a companion volume to the main research paper.
Mind this was pre-internet, and the paper (and code) was classified so very few people would ever read it.
I did put it all on floppy and included copies of that with the printed paper as well.

These days, you'd likely put it on a server somewhere where all those who have access to the paper can access the code, and nobody else.
Of course you will need to figure out who that will be. Most research papers are considered company secrets (or department secrets) and not for public distribution. Your code would fall under the same restrictions.
Just throwing it out there on github, google code, or source forge without prior permission from your employers/coordinators isn't going to make you many friends, in fact you could end up with a rather hefty claim for damages and/or prison time for doing so.

5
  • Perhaps. Please be aware that with github you can have a public repository that is effectively 'throwing your code' out there. Github does, however, also have private repositories that you control access to via people ssh keys. Commented Feb 11, 2014 at 22:35
  • 1
    I strongly disagree with your view on access to the code. If it is a scientific research paper, you would by definition want as many as possible to be able to read it. As scientists and researchers I believe we have an obligation to ensure that our published research results are reproducible. If software that we develop is responsible for those results, we should publish that software along with the paper. Otherwise, the results are not worth much. Commented Feb 12, 2014 at 22:23
  • @ThomasArildsen that's nice in theory. In practice much scientific research falls under NDAs and security regimes that make public disclosure impossible for a long time after completion of the project, and/or without prior permission from the organisation funding it.
    – jwenting
    Commented Mar 4, 2014 at 7:38
  • 1
    @jwenting It is very unfortunate when funders' requirements restrict research in this way and I think we should try to avoid it as much as we can. In the last two research projects I have been involved in, our receiving the grants has been positively influenced by the fact that we promised to publish all of the code. Commented Apr 8, 2014 at 7:17
  • @ThomasArildsen yes and no. As there are often major commercial incentives involved in the results of your research (and/or major political consequences) security if often needed for the sake of those funding the program. Yes, you not being able to talk about it to just anyone without getting that person to sign an NDA as well can delay or prevent ideas from entering the team, but without that process there'd often be no team at all.
    – jwenting
    Commented Apr 8, 2014 at 7:24
1

There's lots of good advice in the various answers; I'm going to address only point 3, "How do I make it easy for other people to run the code?"

The answer here is to automate as much as possible. This will have the added benefit of making your life easier, too, as you'll spend less time typing (and retyping) magic incantations and checking output.

Start, as early as possible, with a top-level script (I usually call it Test) that builds and tests all your code. (This is always the first thing I write.) In your case it sounds like it's too late to start with it, but add it now and grow it in the same way.

Every time you do a new checkout or clone of your repository, start by running the Test script. When it reaches the first error of any kind, consider how you could tweak the script to get rid of that error (if that's easy to do) or detect the error condition and give some informative message to the user. For example, if you depend on libfrozzit and its header files being present for you to compile, you may not be able to install it, but you can at least try to check for its presence and, if absent, fail with, "libfrozzit not found. Install with apt-get frozzit-dev or yum install frozzit-devel?"

Write tests of any kind, whether basic unit tests or functional tests, for your code. Even picking the simplest function and sending one value through it, or running myprogram --help and ensuring it prints any message whatsoever, means you've started a test framework and makes it much, much easier for someone else to come along and add a test. If you can get up to, say, 5% test coverage of your code that's a significant benefit because even that much will be a great help to someone who's wondering if the code was built properly.

Making code easy for others to build and run isn't magic, and can't be done by waving a wand or running a special tool. It's a matter of saying, every time you find yourself doing a manual tweak to get things to work, no matter how simple, asking yourself "how would I automate away the need for that manual tweak"?

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .