175

I just started reading some papers (Computer Science, specifically Computer Vision) and thought "now let's look at the source code" and was quite astonished that most of the papers don't have any source code implementing the described methods to look at, while claiming some performance or being better than other papers.

How do these papers get accepted in journals / conferences? Do people have to submit their source code privately to the reviewers at least, so that they can reproduce the experiment if possible.

Do most journals / conferences just "trust" that people who submit the paper really implemented the theory and got those exact results?

I always had this idea that any experiment should be reproducible by others else it's not scientific justified. Been wondering about this the last few days.

25
  • 51
    Folks in my research area do not really care so much about the actual code used to obtain the results; the concepts behind a coded implementation are more important. It is assumed that anybody with knowledge in the field can produce code in a language of their choosing so long as the concepts (algorithm descriptions, etc.) are explained very explicitly and clearly.
    – Mad Jack
    Commented Jun 11, 2014 at 14:30
  • 13
  • 15
    Note that the problem is not specific to computer code, but follows analogously to labwork as well. Even if the procedures/protocols are described in detail, there is no way of knowing whether or not the person that did the work followed the protocols meticulously.
    – posdef
    Commented Jun 11, 2014 at 15:08
  • 26
    "Do most journals / conferences just "trust" that people who submit the paper really implemented the theory and got those exact results?" This is my impression and I don't like it at all, specially when the funding to produce that implementation comes from public money. If the funding is public, the code should be public. If you want to publish a paper about a system, the code should be published. Very unfortunately this is rarely the case.
    – Trylks
    Commented Jun 11, 2014 at 17:03
  • 27
    Re-running the program wouldn't be a proper verification anyway. If the implementation is buggy, re-running it still gives wrong results. Re-implementing is a proper verification. Reproducibility means that it's explained clearly enough to allow reimplementation. A clear description is much more helpful for reproducing results than code. Code is meant for computers to run, not for people to read. (Remember: in the vast majority of cases this code is implemented by a single person and it's not meant to be maintained, this is not "software engineering"). The paper is meant for people to read.
    – Szabolcs
    Commented Jun 12, 2014 at 19:26

12 Answers 12

112

For me it seems that the reasons are two:

  • the belief that code is only a tool, a particular implementation being secondary to the idea or algorithm,
  • the historical residue (it was unpractical to print a lot of pages; especially as no-one could copy-paste it).

Additionally:

Moreover, things related to current incentives in academia (where publications, not code, are related to one's career possibilities):

  • sharing code may mean risk of being scooped (instead of milking the same code for years),
  • cleaning up code takes time, which can be used for writing another publications.

Do people have to submit their code privately to the reviewers at least, so that they can reproduce the experiment if possible.

Typically - not. If the code is not made public, almost for sure no reviewer have checked its existence; much less - correctness.

However, many scientists are starting to notice the problem (and they see how open source culture flourishes). So, there are new initiatives addressing such issue, like Science Code Manifesto:

Software is a cornerstone of science. Without software, twenty-first century science would be impossible. Without better software, science cannot progress.

Or e.g. this manifesto. Try to search for reproducible research or look at things such as knitr for R and this intro to IPython Notebook, or read about using GitHub for science. And it seems it is taking off.

12
  • 7
    +1. For me, particularly the time consumed for cleaning up is a major point. Even though I already do most analyses as knitr .Rnw files. But there still is a major difference between basically a script with few headlines and something that can go to the supplementary material. Commented Jun 11, 2014 at 18:45
  • 3
    @Zindarod They don't, unfortunately. Commented Jun 12, 2014 at 13:23
  • 2
    @Zindarod: They rewrite it. If they get the same results, that's evidence that both of them had no (or identical) buts. If the source was provided, a verification that the source provided the results would be easy, but that does not show that the theory produces those results, because there could be a bug in the code. Commented Jun 12, 2014 at 22:57
  • 3
    @MooingDuck You mean for every paper, they rewrite the source code from scratch? I am implementing a paper in my day job and it has taken me the better part of the month just to understand the paper. The reviewers go through this process for every paper?
    – user12973
    Commented Jun 13, 2014 at 5:21
  • 12
    @Zindarod Pre-publication peer review is not intended to weed out errors like what you're describing. Pre-publication peer review just checks that the experimental method is sound, that the conclusion matches the claimed results, that appropriate decisions were made regarding datasets, comparisons, and statistical tests. Reproduction happens post-publication, just like in any other field.
    – user15623
    Commented Jun 13, 2014 at 5:49
31

What field are you talking about? A CS paper describing the design and performance of a computer vision algorithm is different from a sociology paper that used a spreadsheet to crunch demographic data.

Do most journals / conferences just "trust" that people who submit the paper really implemented the theory and got those exact results?

Yes. The presumption is always that there is no scientific fraud involved.

I always had this idea that any experiment should be reproducable by others else it's not scientific justified.

If the algorithms are fully described in the paper, then the result is reproducible. To reproduce it, you have to reimplement the algorithm.

I just started reading some papers and thought now let's look at the code and was quite astonished that most of the papers don't have any code to look at, while claiming some performance or being better than other papers.

Presumably the better performance is because the algorithm described in the paper is a more efficient algorithm. For example, when sorting a large amount of data, a quicksort is a better sorting algorithm than a bubble sort. The quicksort has O(n log n) performance on the average, while the bubble sort has O(n^2), and this is true regardless of the details of the implementation.

9
  • 8
    Details in the implementation, however, can effect the constants that are left out of Big-O notation, so directly comparing performance results between two algorithms without being privy to the actual code may be problematic.
    – Mike A.
    Commented Jun 11, 2014 at 15:10
  • 8
    To add to this: In some cases it may even be easier to verify the results by reimplementing the algorithm than by understanding the author’s implementation.
    – Wrzlprmft
    Commented Jun 11, 2014 at 15:12
  • 29
    If the algorithms are fully described in the paper, then the result is reproducible. To reproduce it, you have to reimplement the algorithm. But, knowing people who have tried to reimplement things in economics or physics, descriptions are rarely complete enough to reproduce exact results. Code does not lie. Text may (even if everything is in good faith and with high level of scrutiny, you don't compile text). Commented Jun 11, 2014 at 15:37
  • 3
    I second Piotr's comment, and would add that not only do you need code, but you need a test suite to make sure the code is actually working. And yes, a description of an algorithm in a paper in my experience is rarely complete enough to reimplement from scratch, unless it is a very simple algorithm. Also, of course, it is much preferable to run some code as opposed to reimplementing an algorithm. Commented Jun 11, 2014 at 16:58
  • 3
    Details in the implementation, however, can effect the constants that are left out of Big-O notation Sure. Also, the hardware used to run the code will affect the constant. If the paper was written n years ago, then the code will run faster on today's hardware by some factor that comes from Moore's law. This is why computer scientists typically aren't very interested in judging the efficiency of an algorithm based on wall-clock time, and it's why they're typically interested in properties of the algorithm, not properties of a particular implementation.
    – user1482
    Commented Jun 11, 2014 at 17:47
18

I think an issue that is related to that raised by Piotr (+1) which is that research funding is not generally available to cover the costs of producing highly reliable portable code or the costs of maintaining/supporting code produced to "research quality". I have found this to be a significant issue when trying to use code released by other researchers in my field; all too often I can't get their code to work because it uses some third party library that is no longer available, or that only works on a Windows PC, or which no longer works on my version of the software because it uses some deprecated feature of the language/environment. The only way to get around this is to re-implement the routines from the third party library so that all of the code is provided as a single monolithic program. But who has the time to do that in an underfunded "publish or perish" environment?

If society wants high quality code to accompany every paper, then society needs to make funds available so that good software engineers can write it and maintain it. I agree this would be a good thing, but it doesn't come at zero cost.

5
  • 2
    What you say actually speaks against the publishing of code, since it's likely to become obsolete or to face portability issues. In my opinion, what matters is the theory behind the implementation and I don't think research funding should go into developing robust, platform-independent, frequently updated software products. That's the software industry's job.
    – Cape Code
    Commented Jun 13, 2014 at 15:44
  • 2
    I don't fully agree with that. The reason we publish papers is so that other researchers take up our ideas and run with them. A good way of making sure that happens is by making sure that the tools required are available. Thus there is often a good reason to provide tools that are adequate for that purpose, but that still has a cost that is not currently met. We don't need to produce production quality code, but we do need funding to produce code of adequate quality (from the perspective of portability and reasonable longevity). Commented Jun 13, 2014 at 15:48
  • @DikranMarsupial: "so that other researchers take up our ideas and run with them" - that sounds very easy, as if it were enough to publish one's code and that would directly allow other researchers to build on top of it. However, that is linked to many ifs in reality - it only works if the other researchers know the technologies used for the code, if anything else they want to combine with that code is compatible, if the code runs at all on their platform, etc. Commented Feb 28, 2015 at 23:53
  • @O.R.Mapper you are missing the point, if you provide code it is easier for others to build on your work. How much easier depends on a number of factors, but if you read my answer then you will find I had raised those same points already! I also did not say that it was enough to publish the code - it isn't, the code supports the publication, nothing more. The more you can do to alleviate the "if"s, the better, but at the end of the day you are working to a time & effort budget, so there will always be "if"s left, but that doesn't mean you shouldn't give away research code. Commented Mar 2, 2015 at 8:12
  • 3
    The code does NOT need to meet any code quality standards. As long as the code produces the results as promised by the paper. Let software engineers optimize or rewrite the code. Commented Mar 4, 2015 at 5:59
13

Because some researchers do not like to think about the real world and reviewers do not want the hassle.

(What's next is a bit of a rant)

I've recently done a survey of a specific type of geometry related algorithms. In all the papers the program was described as working perfectly but once I requested the source code from about a dozen authors things became ugly.

50% of the software was missing important advertised features. For example the software would only work in 2D while the paper showed 3D examples. (Which in my case really makes things a lot more difficult). After inquiring why these features were missing they had usually never been implemented or had been implemented but proved unstable/non-working. To be clear: it was 100% impossible to generate the results shown in the paper with software that was often even improved after the paper was released.

75% of the software wouldn't work perfectly in normal cases. In my case this usually was because the algorithm was designed with 'perfect numbers' in mind but the algorithm was implemented using normal floating point numbers and thus had representation problems which resulted in large errors. Only a few papers mentioned these problems and only two tried to (partially) address them.

85% of the software wouldn't work in scenarios specifically designed to find problem cases. Let's be honest; if a 'mere' student can find a scenario in a few weeks that totally breaks your algorithm you probably already know about this.

Not supplying code makes it possible to lie and to my disgust (I'm new to the academic world) this is done extremely often. My supervisor wasn't even surprised. However testing code is a lot of work so this behavior will probably go unchecked for a while longer.

12

You seem to think that we should request code, because without code, any crazy result, be it fraud or honest mistake, can be slipped into the journal. But this is not so. Including code is a nice-to-have feature, not a must-have feature. The other answers silently assume this and explain the (good and not-so-good) reasons which lead to the current situation of uncommonly included code. I think I can complement them by explaining why it is not a must-have feature.

For theoretical results, you don't need any empirical tools like code to reproduce them, as others mentioned (e.g. proving that an algorithm has a better big O behavior than another). Of course, there are also empirical results, which cannot be replicated that way.

But your reviewers will have an expectation of what your idea will result in. If the current best performance for wugging zums is 3 zums/s, and you add a minor tweak and report 300 zums/s, your reviewers are supposed to notice that your claim is unusual, and do something (possibly demand to resubmit with the code). This is not foolproof, but with multiple reviewers per paper, it is effective, because the magnitude of most empirical results is predictable once the reviewer sees the idea and understands how it works.

For this class of paper, both honest and dishonest mistakes have a good chance of being caught, with bad results for honest scientists (reputation loss, especially if caught after publication) and worse results for dishonest scientists (end of career if proven!). Moreover, the graver the mistake (as measured in the size of error), the higher the chance of being caught. It is less likely that you will get caught if your algorithm manages 4 zums/s and you report 5 zums/second, than if you report 300 zums/s. So, scientists are disincentivized from submitting incorrect papers, leaving less incorrect ones in the submitted pool, and the reviewers catch lots of the remainder.

There are cases where it is totally unknown why an observation is the way it is, and in these cases, it is very important to describe the exact test setup perfectly. But I have never seen this kind of paper in computer science, it is associated with natural sciences. So no code there. Even if you got such results in computer science (e.g. you observed that users are capable of reading a 12000 word EULA in less than 30 seconds, which contradicts common reading speed observations, and you have no explanation for it), it is unlikely that including the code you used to obtain the result will be pertinent to replication.

To put it together, among a large mass of computer science papers, the theoretical ones and the natural-phenomenon-observation ones don't need code inclusion for replication, and the remaining ones will contain only a low percentage of incorrect-but-uncaught papers. Aggregated, this leads to an acceptably low level of incorrect papers being submitted. Requesting the code to go with them will increase quality for one class of paper, but it will be an increase of an already high quality level. It is not that not having this feature makes the current quality too low.

2
  • Very good point about the difference between nice-to-have and must-have features. I recall some paper (possibly an editorial?; Unfortunately right now I cannot find it) discussing a similar problematic for standards of reporting statistical analyses. The bottomline was that basically a check list for good practice is known, and it was suggested that everyone who knows and obeys it includes a single sentence stating this (or a more minute description). The idea is that this nice-to-have strategy will push quality without the need of consensus about must-have policies. Commented Jun 12, 2014 at 10:52
  • +1 for essentially casting this in signal detection terms. Nice! Commented Jun 12, 2014 at 19:48
9

As an associate editor of a journal (bridging statistics and psychology), I requested the authors to submit the code when they proposed new algorithms and procedures, and then sent the code to the experts in the statistical package to check that (a) the code does what the paper describes, and (b, secondary) that it is a good code (robust to bad inputs, computationally effective, etc.). I was also asked to review some papers for Stata Journal whose focus is the code, and did the same. There were times when (a) failed, so I had to return the paper and say that the authors had to align the methodology and the code. There were times when (b) would fail, and in case of Stata Journal, this would also mean returning the paper. There were times when the code wouldn't come.

Most of the time, I would be happy to share my code, but it is complicated enough (with internal meta-data-based checks, customized output, etc.) that a researcher less proficient with the packages I use won't be able to edit it to make it work on their computer.

Going back to your main question -- reviewers are lazy pressed for time, and have their own research to push to their journals, so few of them go into the effort of fully verifying the results. This is just how the world is. May be these full professors could request the code and give it to their graduate students to play with, break and debug, as this would be a good educational opportunity for the latter. But again this does not happen very often, as the confidentiality clauses for accepting the reviewer role usually preclude one from sharing the paper with anybody else.

7

I'd like to add a slightly different point of view from an experimental field (chemistry/ spectroscopy/ chemometric data analysis).
Here, the study starts in the lab (or maybe in the field), with an old-fashioned type of notebook. Old-fashioned typically still being paper + pen. Data analysis is often done with GUI programs in an interactive fashion. Records are kept just like in the lab: paper + pen. Maybe with some saved and/or printed figures. As already the part in the lab was recorded this way, not having a log-file or even a script of the data analysis is not seen as a problem. Anyways, asking for the code to be published is only one part of what you need to re-run the analysis: you'd also need the actual data.
Already the suggestion to type in the data analysis and at least save either a log of the matlab/R session or type it in script form is still kind of new (though people love the knitr generated reports I produce...). But IMHO things are moving quite fast now. I'd say that with tools like git and knitr the largest practical obstacles are almost solved at least for the type of person that prefers code over clicking. However, it is not that all already works smoothly (consider large binary raw data and git, and I frankly admit that I have no idea how to practically set up a "real" database server in an efficient way that it keeps track of changes). This is from my perspective as a scientist that just needs tools for reproducibility as a user - and thus I understand my non-programming colleagues who nevertheless need to analyse their data: they just do not have (or know of) the tools that would enable them to log their analyes with reasonable effort.

The traditional estimation of where the big difficulties lurk also focuses on the lab part. I think many researchers are just not aware of the reproducibility issues with the calculations/data analysis. To be honest, I usually share that point of view: IMHO in biospectroscopy one of the big important problems is the far too low number of individuals in the studies. If you have only 4 mice in the study, the precise handling of the data will cannot affect the practical conclusions too much. There is a gray zone where not doing a proper validation may affect the conclusions, but again: everyone I know who does the validation according to the best-known practice does spell this out very explicitly - so again (and accepting some risk of falsely discarding few papers as "probably not reproducible") I tend to think that the practical conclusions are hardly affected.


On the other hand, looking at the requirements that e.g. the Chemmical Communications put up if a new chemical substance is to be published I don't see why there cannot be computer science journals that require the code in a similar fashion.
Like e.g. the Journal of Statistical Software does. (I'm quite sure other such journals exist, but I don't know them.)

To me this falls into a much larger field of reproducibility issues. Of course, if the data analysis cannot be reproduced there's big trouble.


Yet another point: although publications about software are still very rare in my field, I recently had such a paper for review. Unfortunately, the software is proposed to be distributed by contacting one of the authors - which, as an anonymous reviewer I obviously could not do.
Thus, the actual software may be even less accessible for reviewers than for normal readers!

2
  • For your last paragraph: surely you can ask the editor to ask the authors for the code? Commented Jun 12, 2014 at 6:41
  • @NateEldredge: sure. I merely put it as example that attaching the code is still really far from arriving in the defaults-things-to-do when submitting a manuscript if it is forgotten even for a manuscript that explicitly deals with releasing that code. Commented Jun 12, 2014 at 10:32
6

I would think that most readers/reviewers would find a sufficiently detailed algorithm enough. You write your paper showing, oh, C++ code, and I use SPSS in my shop -- your code is useless to me. Not that most readers would enjoy reimplementing the code (especially for non-CS papers), but with a specific code that runs on a specific platform, there's bound to be a lot of clutter to wade through. An algorithm reduces it to its bare essentials.

If my paper is showing the improvement in speed of my new Quicksort method over the standard Bubble Sort, showing algorithms for the two methods would make it easier to support my claims for O (n log n) vs O (n^2) speedup. If my paper is on population age distributions in wealthy vs developing economies, unless there's a really neat trick I used to process the data, most readers probably wouldn't even care about the algorithm, except in very broad brushstrokes.

It's going to depend on the subject area (general, say, Computer Science) and the specific subarea (say, sorting methods), as well as how heavy an impact the algorithm used has on the results, as to whether the algorithm is necessary. If I'm showing compiler differences in Fortran code, then it would be good to include actual code. Otherwise, code itself is rarely of interest.

1
  • 1
    "I would think that most readers/reviewers would find a sufficiently detailed algorithm enough." For an algorithm of sufficient complexity, a description is not enough. "your code is useless to me". Provided it is working code (and maybe even if it isn't) code by definition is a implementation of an algorithm, and therefore useful. Commented Mar 2, 2015 at 0:47
3

This is a view from Computer Science/Theoretical Computer Science/Mathematics.

Ask yourself: who is the target audience of an academic paper?

It is not end-users. It is reviewers! Do reviewers want code? Depends on the situation. Sometimes they do. Often they don't.

Think about this: why mathematicians don't provide formal proofs but use informal arguments?

It is costs vs. benefits. Providing a formally verified proof is possible but usually needs too much work, work that authors are not trained for and don't have much experience with. On the other hand, what do authors gain from it? Does it help convince the reviewers about the correctness of the results? No, usually reviewers prefer short informal explanation that allows them to understand and see why the result is true. A formal proof usually will not help much. There are people who don't like computer-assisted proofs which cannot be verified and understood directly by humans.

The same costs vs. benefits thinking applies to programs. If providing code will not help convince the reviewer about the correctness of the paper, then why waste resources (time/money/pages/...) to do so? Do reviewers have time to read codes with thousands of lines to check there is no bug in them?

On the other hand, sometimes the software resulting from the paper is of primary interest. Having the code is helpful in verifying the claims. E.g. you claim you have a faster algorithm for SAT. Then it is helpful to provide the code. In such cases authors provide their code. This is mainly in more experimental parts. We don't care about the correctness of the code but obtaining results better than existing algorithms. In such situations there are typically standard benchmarks to compare algorithms. (See for example SAT competitions.) If there aren't established benchmarks then why publish code? If it is a theoretical result where the asymptotic benefits take place over instances which are too large to test what is the benefit of have the code? More so considering the fact that large code developed by non-professional programmers is highly likely to be buggy? Employing professional software developers to develop quality code is costly (the median annual income for a person with a bachelors in CS is around 100K in the US) (except possibly as graduate students ;) and doesn't typically have any profits afterwards.

But does code need to be included in papers? Of course not! There are better ways to publish code, e.g. having a link in the paper to an online copy (on their website or a public repository like github). Why would one prefer to include a code with thousands of lines inside a document which is supposed to be read by humans?

6
  • 8
    Possibly the target audience of an academic submission is the reviewer, but the target audience of a paper is definitely the community at large.
    – Suresh
    Commented Jun 13, 2014 at 8:07
  • 2
    @Suresh, obviously papers are written for the readers for the academic community at large. However I think authors pay more attention to what reviewers expect. Of course reviewers are intended to represent the community, but in their role as reviewers they often have priorities which can be a bit different from the community at large.
    – Kaveh
    Commented Jun 13, 2014 at 9:02
  • E.g. they might want more details than other readers to make sure the results are correct, they may need less details because they are experts in the topic, or sometimes they may object to a paper because they think the presentation contradicts their view of the topic, etc. It seems to me we typically consider the community at large much less than the potential reviewers as our audiences when writing papers in practice. Anyway, I don't think the issue has a big effect on what I am trying to say in the answer:
    – Kaveh
    Commented Jun 13, 2014 at 9:05
  • 1
    code is published when the target audience wants it from the authors, and end-users who want ready-to-use code are not the target audience of academic papers.
    – Kaveh
    Commented Jun 13, 2014 at 9:11
  • this last point I agree with.
    – Suresh
    Commented Jun 13, 2014 at 9:28
0

Usually "performance" of code relates to "how well it scales with the size of the problem". This means, in my opinion, that a paper claiming "algorithm A is faster than algorithm B" needs to show timing for both A and B _ for different size problems_. While there may still be inefficiencies (and errors) in the implementations, this will at least demonstrate whether the underlying algorithm is more efficient (lower big-oh).

Where software is they key to the paper (the product, not the tool) I would expect it to be available (github or other). So a paper that says "I can sort in order 1/n by kerfugling the dargibold number" needs to show how that was done; the paper that claims "when I sorted these two data sets, the one from Whoville had bigger flubbits" does not need to show how the code was sorted - they need to focus on explaining what is significant about the flubbits from Whoville.

0

Taken from the book SKIENA, S. The Algorithm Design Manual: Springer. 2008:

After a long search, Chazelle [Cha91] discovered a linear-time algorithm for triangulating a simple polygon. This algorithm is sufficiently hopeless to implement that it qualifies more as an existence proof.

[Cha91] CHAZELLE, Bernard. Triangulating a simple polygon in linear time. Discrete & Computational Geometry, 1991, 6.3: 485-524.

I think that the authors consider costs and benefits and avoid the submission of code when the costs outweigh the benefits.

According to David W. Hogg costs are:

  1. you might get scooped
  2. you can be embarrassed by your ugly code, or by your failure to comment or document
  3. you will have to answer questions (and often stupid or irrelevant ones) about the code, and spend time documenting
  4. you will have to consider pull requests or otherwise maintain the code, possibly far into the future
  5. you might have to beat down bad and incorrect results generated with your code
  6. your reputation could suffer if the code is used wrongly
  7. there could be legal (including military) and license issues

Benefits are:

  1. more science gets done, more papers get written; all measures of impact increase
  2. you get motivation to clean and document the code
  3. results become (more) reproducible and (more) trustworthy
  4. outsiders find bugs or make improvements to your code and deliver pull requests
  5. you get cred and visibility and build community
  6. citation rates might go up
  7. code is preserved
  8. code becomes searchable (including by you!) and backed up
  9. there are good sites for long-term archiving and interface
  10. can establish priority on an idea or prior art on a method
0

Because in truth most academic researchers in technical fields can't code to save their lives. In computer vision, they just hack something together in a single Matlab file. They like to think that coding is for undergrads. They believe it's beneath them to waste their time on such trivialities. Most of them never learned good software engineering practices and don't have an understanding of the complexity and skill it takes to write good code.

The major problem is that this is supported by researchers because all academia cares about is publishing, quantity over quality. Success is measured by how many citations you have, not by how good your contributions are. At the end of the day, when the only people citing you themselves don't produce anything useful, it doesn't matter. It's not science anymore as much as it is "research for the sake of doing research".

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .