61

While researching a topic area I have come across a number of papers that claim to improve on the state of the art and have been published at respected outlets (e.g. CVPR, ICIP). These papers are often written in a way that obscures some of the details and their methods can be lacking in detail. Upon contacting these authors for more information and asking if they would kindly make their source code available they stop replying or decline the offer.

Why are computer science researchers reluctant to share their code?

I would have expected that disseminating your source code would have positive effects for the author, e.g., greater recognition and visibility within the community and more citations. What am I missing?

For the future, what are some better ways to approach fellow researchers that will result in greater success at getting a copy of their source code?

9
  • 3
    An important issue, but you you split it in two questions? (On SE sites there should be one question per one... well, question.) That is, could you make another post out of the second question? Commented May 27, 2013 at 9:12
  • 5
    I considered separating the questions but thought that the 2nd would not stand on it's own. Commented May 27, 2013 at 9:45
  • You may want to take a look at Collective Mind initiative: hipeac.net/system/files/grigori.pdf and corresponding publication model ctuning.org/cm-journal Commented May 28, 2013 at 9:06
  • I could post an answer, but it would something like - how could we change things so that more people will publish their source code? Would that be acceptable, or does it belong to a different question? Commented May 28, 2013 at 16:06
  • 3
    @DikranMarsupial I'm not convinced that throwing more money at researchers will improve the situation as they have limited time. I don't think that distributing high quality code is the issue, it's just an excuse people use. Releasing bad code is better than releasing none. Commented May 29, 2013 at 11:13

5 Answers 5

42

Why researchers might be reluctant to share their code: In my experience, there are two common reasons why some/many researchers do not share their code.

First, the code may give the researchers an important advantage for follow-on work. It may help them get a step ahead of other researchers and publish follow-on research faster. If the researchers have plans to do follow-on research, keeping their code secret gives them a competitive advantage and helps them avoid getting scooped by someone else. (This may be good, or it may be bad; I'm not taking a position on that.)

Second, a lot of research code is, well, research-quality. The researchers probably thought it was good enough to test the paper's hypotheses, but that's all. It may have many known problems; it may not have any documentation; it might be tricky to use; it might compile on only one platform; and so forth. All of these may make it hard for someone else to use. Or, it may take a bunch of work to explain how to someone else how to use the code. Also, the code might be a prototype, but not production-quality. It's not unusual to take shortcuts while coding: shortcuts that don't affect the research results and are fine in the context of a research paper, but that would be unacceptable for deployed production-quality code. Some people are perfectionists, and don't like the idea of sharing code with known weaknesses or where they took shortcuts; they don't want to be embarrassed when others see the code.

The second reason is probably the more important one; it is very common.

How to approach researchers: My suggestion is to re-focus your interactions with those researchers. What are your real goals? Your real goals are to understand their algorithms better. So, start from that perspective, and act accordingly. If there are some parts in the paper that are hard to follow or ambiguous, start by reading and re-reading their paper, to see if there are some details you might have missed. Think hard about how to fill in any missing gaps. Make a serious effort on your own, first.

If you are at a research level, and you've put in a serious effort to understand, and you still don't understand ... email the authors and ask them for clarification on the specific point(s) that you think are unclear. Don't bother authors unnecessarily -- but if you show interest in their work and have a good question, many authors are happy to respond. They're just grateful that someone is reading their papers and interested enough in their work to study their work carefully and ask insightful questions.

But do make sure you are asking good questions. Don't be lazy and ask the authors to clear up something that you could have figured out on your own with more thought. Authors can sense that, and will write you off as a pest, not a valued colleague.

Very important: Please understand that my answer explaining why researchers might not share their code is intended as a descriptive answer, not a prescriptive answer. I am emphatically not making any judgements about whether their reasons are good ones, or whether researchers are right (or wrong) to think this way. I'm not taking a position on whether researchers should share their code or not; I'm just describing how some researchers do behave. What they ought to do is an entirely different ball of wax.

The original poster asked for help understanding why many researchers do not share their code, and that's what I'm responding to. Arguments about whether these reasons are good ones are subjective and off-topic for this question; if you want to have that debate, post a separate question.

And please, I urge you to use some empathy here. Regardless of whether you think researchers are in right or wrong not to share their code in these circumstances, please understand that many researchers do have reasons that feel valid and appropriate to them. Try to understand their mindset before reflexively criticizing them. I'm not trying to say that their reasons are necessarily right and good for the field. I'm just saying that, if you want to persuade people to change their practices, it's important to first understand the motivations and structural forces that have influenced their current actions, before you launch into trying to browbeat them into acting differently.


Appendix: I definitely second Jan Gorzny's recommendation to read the article in SIAM News that he cites. It is informative.

1
  • 2
    +1 Most of the code is written to a deadline, be it a PhD student trying to finish up or a post doc getting a deliverable done in time. I know many people (myself firmly included) who would be embarrassed to be judged on their code quality rather than the actual research it supports.
    – ThomasH
    Commented May 31, 2013 at 0:21
24

Stephen, I have just the same experience as you do, and my explanation is that the benefit/cost ratio is too low.

Packing a piece of software, so that it can be usable by another person, is difficult - often even more difficult than writing it in the first place. It requires, among others:

  • writing documentation and installation instructions,
  • making sure the code is runnable on a variety of computers and operating systems (I code on Ubuntu, but you may code on Windows, so I have to get a Windows virtual machine to make sure it works there too),
  • answering maintenance questions of the form "why do I get this and that compilation error when I compile your program on the new version of Ubuntu" (go figure. Maybe the new version of Ubuntu dropped some library required by the code? who knows).
  • taking care of 3rd-party dependencies (my code may work fine, but it depends on some 3rd-party jar file whose author decided to remove from the web).

Additionally, I should be available to answer questions and fix bugs, several years after I graduate, when I already work full-time in another place, and have small kids.

And all this, without getting any special payment or academic credit for all that effort.

One possible solution I recently thought of is, to create a new journal, Journal of Reproducible Computer Science, that will accept only publications whose experiments can be repeated easily. Here are some of my thoughts about such a journal:

Submitted papers must have a detailed reproduction section, with (at least) the following sub-sections: - pre-requisites - what systems, 3rd-party software, etc., are required to repeat the experiment; - instructions - detailed instructions on how to repeat the experiment. - licenses - either open-source or closed-source license, but must allow free usage for research purposes.

The review process requires each of 3 different reviewers, from different backgrounds, to go through this section, using different computers and operating systems.

After the review process, if the paper is accepted for publication, there will be another pre-publication step, which will last for a year. During this step, the paper will be available to all the readers, and they will have the option to repeat the experiment and also contact the author in case there are any problems. Only after this year, the paper will be finally published.

This journal will enable researchers to get credit for the difficult and important work of making their code usable to others.

EDIT: I now see that someone already thought about this! https://www.scienceexchange.com/reproducibility

"Science Exchange, PLOS ONE, figshare, and Mendeley have launched the Reproducibility Initiative to address this problem. It’s time to start rewarding the people who take the extra time to do the most careful and reproducible work. Current academic incentives place an emphasis on novelty, which comes at the expense of rigor. Studies submitted to the Initiative join a pool of research, which will be selectively replicated as funding becomes available. The Initiative operates on an opt-in basis because we believe that the scientific consensus on the most robust, as opposed to simply the most cited, work is a valuable signal to help identify high quality reproducible findings that can be reliably built upon to advance scientific understanding."

13
  • 10
    I don't think that's a valid answer. You can just document your setup or even just distribute the code. Even if I can't run, I can learn lots by just inspecting it.
    – Spidey
    Commented May 27, 2013 at 13:03
  • 10
    I also thought this way, but over time I learned that when you release your code to the public, you inevitable have some responsibility over it. If the code isn't well-documented, if it doesn't compile, if it doesn't run - people will hold you responsible, and it might be bad for your reputation. Commented May 27, 2013 at 13:15
  • 8
    I agree that if your code doesn't compile or doesn't run it might be bad for your reputation but I think it's reasonable to turn down requests to change/fix your code if you are simply publishing it. If you're managing an open source project that's different but why would publishing source code require more than simply answering questions (as would any publication)?
    – earthling
    Commented May 27, 2013 at 14:16
  • 4
    @ErelSegalHalevi It's much worse for the reputation to hide the implementation details. IMHO, it basically means "believe me". That's not how science works. Hiding the code violates the most important law of making science: falseability. You can't invalidate a work if you don't have access to it. The author can hide behind this black curtain denying whatever attempt to reproduce/invalidate his paper by saying it's not identical to his method.
    – Spidey
    Commented May 27, 2013 at 16:28
  • 4
    @Spidey, Erel's answer is completely accurate. He does describe a mindset that many researchers do have. That mindset might be a good one, or might be bad for the field -- but regardless, what matter is that many researchers do share that mindset, and act accordingly. The original poster asked for an explanation of why many researchers have decided not to share their code; Erel has given an accurate description of why some/many researchers have decided to do so. You can agree or disagree with whether they've made the best choice, but that's not the question here.
    – D.W.
    Commented May 28, 2013 at 4:49
14

This article in SIAM News sheds some light on the first question, so it might be worth a look. It argues, for a mathematical audience, why researchers ought to publish their source code, and lists many of the reasons you might hear why researchers do not share their source code. It does so by a clever analogy, one that compares the sharing of mathematical proofs to the sharing of source code. Take a look; it has quite an extensive list of reasons why researchers might prefer not to share their source code (as well as some responses arguing that those reasons are not good ones).

Here's a citation:

Top Ten Reasons To Not Share Your Code (and why you should anyway). Randall J. LeVeque. SIAM News, April 1, 2013.

1
  • 6
    I suggest that you give more information about the article you link to. For example, state the title and perhaps a sentence describing the main idea. Note that links expire, and your answer would be more useful if it provided the information even if the link fails.
    – JRN
    Commented May 28, 2013 at 2:14
7

In sharing code there are several issues:

  • The first issue is the copyright matters, since some of CS researches/projects are funded by certain industrialists/funding organizations that discourage sharing sensitive information such as algorithms, code, or software while publishing in public periodicals.

  • Indeed, there are papers based on certain data (collected from code execution) that unfortunately are manually modified by the authors. If they share the code, catching their mistake/error/modifications becomes very easy leading to failure in either their MS/PhD or research project which is undesirable.

  • In CS research and especially publication, developing code, particularly a lengthy, complex code is a non-trivial task and in most of the cases is considered money-making and paper-generating asset. By sharing the code to the public, they are unveiling facts in very much detail which may degrade their contribution in future researches. Also they may not be the only one who can regenerate article and make credit of that particular research and code. In most of the cases, master students pick an algorithm or method, slightly change it and submit a thesis and paper based on it, that may contradict with the findings and claims of the first author. Remember Thoma Herdon a graduate students who criticized findings of two eminent economist of Harvard university(here is the link ). If the codes in CS are revealed the consequences are likely catastrophic (it might not be too many cases, but if happens it will be catastrophic).

  • Codes are vital property to most of the researchers to conduct experiment and research. If you have a code, you can simply play with it and modify it to generate new set of findings that might be more valuable than the initial findings. Without having authorship of the initial author, there is no credit to them.

However, Elsevier recently introduced a new feature using COLLAGE called Executable Papers that is currently available for Computers & Graphics journal by which codes and data are available and researchers can modify the code and input values to play with.

Hope it helps.


15
  • 9
    If the codes in CS are revealed the consequences are likely catastrophic. — So you're accusing an entire intellectual discipline of fraud? Really?
    – JeffE
    Commented May 27, 2013 at 17:23
  • 4
    @JeffE I wouldn't be so harsh to call it all a fraud, but it would definitely improve the overall quality of research papers.
    – Spidey
    Commented May 27, 2013 at 18:09
  • 8
    Sounds like an accusation of fraud to me, or least criminal incompetence. The only reason publishing data/code would be "catastrophic" is if that data/code did not support the published conclusions about that data/code, as they didn't in the Reinhart-Rogoff paper referenced one sentence earlier.
    – JeffE
    Commented May 27, 2013 at 19:12
  • 3
    I don't get the second point and the latter half of the third point. Since when does CS stand for dishonest junk pseudo-science where authors manipulate data and hide the details because otherwise what they call "results" would be falsified? If it results in a catastrophe for authors to be all honest and make things verifiable, your field should have collapsed already and gone forever. Like JeffE said, you're accusing CS if you're suggesting these are valid answers to OP's question. You must present evidence. Oh, you collected evidence by your code and manipulated it? That's how CS works, huh? Commented May 28, 2013 at 3:14
  • 3
    @Espanta, it's a huge leap from "you think it would be logical if funders prevented researchers from sharing code" to what you actually wrote. Just because you think something would be logical, doesn't mean it is actually so. What you actually wrote in the answer is almost certainly wrong. If you care about accuracy, you will edit your answer to fix what you wrote and remove the claim that "large number of CS researches are funded by certain organization that does not allow people to share their codes".
    – D.W.
    Commented May 28, 2013 at 7:41
3

I am not a CS researcher per se, but I am writing Android code for my research in Atmospheric Physics, so my view is somewhat limited. However, I can say from my own experience that much of the code that I am developing and testing is part of a greater project that the team I am part of is developing. It is a mix of the rules I am bound by and the need to keep a portion of code under wraps for the time being.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .