43

Say I have written code that performs a physics calculation. After this, I get a paper published based on the results of the code. In the interest of advancing the progress of science, I upload the code used for the paper on, say, GitHub. I also post a link to my code on GitHub on my website. I do this to facilitate discovery and use of the code by those who are interested in my results. Imagine that someone, in the process of reproducing my results (using the aforementioned code), discovers a flaw in the logic of the code. Correction of this logic flaw leads to invalidation of the central idea of the paper.

Will this lead to retraction?
Will there be any positive gain to me as a result of publishing of the code?

9
  • 9
    FWIW there has been a movement in Computer Science towards releasing source code, so results can be reproduced. See artifact-eval.org/motivation.html
    – soegaard
    Commented Aug 7, 2019 at 17:16
  • 19
    Positive gain? Sure, you will go to heaven instead of hell. Seriously, behaving ethically is always a positive gain. Commented Aug 7, 2019 at 19:22
  • 41
    If the paper is wrong, learning that you need to retract it is the positive gain.
    – Ray
    Commented Aug 7, 2019 at 19:55
  • 1
    What is someone builds a machine assuming your findings were correct, and it malfunctions and kills a lot of people because it turns out it was incorrect?
    – vsz
    Commented Aug 9, 2019 at 6:13
  • 1
    Shouldn't you be validating the result to make sure both the logic and program are sound before publishing?
    – Mast
    Commented Aug 9, 2019 at 8:58

7 Answers 7

106

If the main idea in the paper has been invalidated by the correction in the code, you would do well to try to retract the paper yourself. This is just a point of professional ethics. It also protects you in a way from future claims if people don't examine everything thoroughly.

The journal may not be able to actually retract the paper, but might be able to post a note (printed or online) that the paper has a flaw (noted by the author, hopefully).

But others, relying on the original thesis of the paper might be misled in their own work. You really don't want that to happen.

Honesty in science is assumed. Make it so.

You might also be able to publish a better paper, based on the corrected code. Work toward that end.

8
  • 17
    Is honesty really assumed? I’ve encountered or heard of so much dishonesty that I am slow to believe anything.
    – WGroleau
    Commented Aug 7, 2019 at 14:38
  • 11
    @WGroleau it's likely very field-dependent. There are undoubtedly areas of science where there is a pressure to produce specific results, not necessary the most correct ones.
    – Dan M.
    Commented Aug 7, 2019 at 16:04
  • 3
    It's not only conflict of interest. Sometimes it is (or seems to be), "that guy's conclusion says I was wrong, and therefore I must fight it."
    – WGroleau
    Commented Aug 7, 2019 at 16:08
  • 3
    I think in many areas people take a "trust but verify" attitude. Commented Aug 8, 2019 at 6:44
  • 4
    It is true that there is dishonestly in science as there is in anything else. But without assuming honesty, science cannot proceed. Each paper I read contains years and years worth of experiments. There is no way each reader of this could repeat the experiments. I have to assume that the author isn't making up their results. Commented Aug 8, 2019 at 11:07
35

Will there be any positive gain thanks to the publishing of the code to me?

Publishing the code is necessary to make the calculation reproducible and the results verifiable. If I were the referee of your paper I would likely insist that you publish the code. So the “positive gain” would be that your paper will not be rejected outright. It will also help your reputation and build up other researchers’ impression of you as a serious, careful scientist who understands what it means to do good science.

Besides, what you are asking is essentially “is there a positive gain to behaving honestly”. I’m not going to enter a philosophical discussion about honesty and its benefits here, but just think for a second about what you’re saying. Even in a specific context of academic research, your question can be rephrased as “I am thinking of hiding information about the way I did my research that would be essential for other researchers to verify my results. Is there a positive gain from not hiding this information?” Again, think about what you’re asking.

It’s clear from the question that you are in fact a person who is motivated by a desire to advance science and wants to do the right thing. That’s great, and the conclusion is that it is your duty to disclose the relevant information about your research that would enable other researchers to check your results. If the results later turn out to be invalid, then you and the journal you published in would need to deal with it in an appropriate and responsible way, either by issuing a note pointing out the error, or (which typically would happen only in really extreme, egregious circumstances) by retracting the article. Honestly I don’t think this is something to worry about too much. As long as you’re acting in good faith and doing your best to do good science, you are adding to the sum total of human knowledge and your work has value. That’s what matters, and that’s what you will ultimately be judged on by your peers in the community.


Edit: my opinion about requiring authors to make code available as a condition for publication generated some controversy in the comments, but I find the arguments for allowing authors to withhold code to be quite weak. I suggest that people think more about this issue, and consider in particular the fact that the Nature Research family of 148 journals has exactly the requirement I suggested as part of its official policy:

Reporting standards and availability of data, materials, code and protocols

An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. A condition of publication in a Nature Research journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must alsobe disclosed in the submitted manuscript.

After publication, readers who encounter refusal by the authors to comply with these policies should contact the chief editor of the journal. In cases where editors are unable to resolve a complaint, the journal may refer the matter to the authors' funding institution and/or publish a formal statement of correction, attached online to the publication, stating that readers have been unable to obtain necessary materials to replicate the findings.

16
  • 4
    I think it is a valid question to ask if honesty or sharing information will be detrimental to a scientific career. Commented Aug 6, 2019 at 18:00
  • 6
    @ASimpleAlgorithm anyone is free to do research in private and not publish the results to protect any technological or commercial secrets. When you publish the results, by definition you are entering them into the public domain of scientific knowledge. It’s a two sided deal: you get credit for your work, and everybody else gets to use, verify and build on the knowledge you created. So, within reasonable limits, any self-respecting journal should require you to make any data needed for verification of your results available to their readers, including code. Don’t like it? Don’t publish.
    – Dan Romik
    Commented Aug 7, 2019 at 2:13
  • 27
    @tpg2114 I understand that the norm in some sub-fields does not make release of code a requirement for publication. That doesn’t mean the norm makes sense. On a philosophical level, withholding code has exactly the same status as withholding experimental data or deliberately obfuscating your description of your research methods to prevent other researchers from building on your work. Yeah, people do those things too and get away with it. It doesn’t make it right.
    – Dan Romik
    Commented Aug 7, 2019 at 2:18
  • 10
    If you don't publish your code, preferably thoroughly documented, you haven't described your actual methods. Refusing to provide code, that is infinitely copiable practically gratis, because you wouldn't provide experimental equipment is frankly ludicrous. "You wouldn't download a wind tunnel." Commented Aug 7, 2019 at 12:59
  • 8
    @tpg2114: "It would take months to reproduce results on supercomputers, plus months of training to be able to do all the steps needed to compile/run the code." -> Months is better than years. Publish the code :-)
    – einpoklum
    Commented Aug 7, 2019 at 19:26
13

@Buffy is certainly right that Science itself gains a lot if people publish their code. Papers without code (the norm in many scientific areas) are hard to reproduce or build upon.

But you ask what you gain from this, or if it might harm your career.

First of all, it is unlikely that somebody finds a major flaw in your program and it is even more unlikely that a journal will retract the paper because somebody else (not you) requested this. Most of the wrong or doubtful results just stay in the literature.

What is much more likely: Somebody will actually use or extend your results, and help you improve them. So he/she will cite you or work with you on a future paper. This is definitely something you want.

2
  • 5
    it is unlikely that somebody finds a major flaw in your program, yet, the OP hypothesises that such a flaw has been found.
    – user2768
    Commented Aug 6, 2019 at 15:38
  • 4
    This was more an answer to Will there be any positive gain thanks to the publishing of the code to me? Commented Aug 6, 2019 at 16:28
12

Will this lead to retraction?

With due respect - that is the wrong question. You've said that, in your scenario:

correction of this logic flaw leads to invalidation of the central idea of the paper.

That's not possible. Either the central idea is valid, or it isn't (let's not quibble about a logical "excluded middle" or semi-validity etc). Publication doesn't validate or invalidate it (and again, let's not quibble about Schroedinger's-cat considerations). If it's valid, then a flaw in the code only means that the code doesn't prove/establish the idea. If it isn't valid - then it is imperative to, well, humanity, that an article claiming its validity not be published as though the idea were valid. Wouldn't you agree?

Will there be any positive gain to me as a result of publishing of the code?

This phrasing of the question comes off a bit selfish. There are obvious positive gains, generally, from publishing the code. Why does it have to be about the benefit for you personally? You're a scientist, my friend - put your ego a bit to the side here.

But, yes, several gains (not by order of significance):

  1. Your result/finding will be better and more widely accepted.
  2. The potential for future collaboration with you will increase somewhat.
  3. Working on the code and getting it to a releasable state may yield additional results, or perspectives on the same result.
  4. Other scientists would be better able to conduct research based on your results (yes, that is a positive gain, despite the potential for others "stealing your thunder")
  5. People will think somewhat more highly of you as a researcher - you can "put your code where your mouth is".
  6. Someone might figure out a flaw, allowing you to retract your paper (yes, this too is positive - you certainly don't want to have a baseless paper on your record, do you? Retraction is better than living in infamy, so to speak.)
6
  • 1
    Could you explain, why? i was always under the impression that often hiring comitees would probably not notice or care about a single invalid paper (provided there are a lot of other good papers), while a redacted paper does smell strange because this happens not so often. Do you have other experiences with this? Or references? Moreover, why would someone not redact their invalid paper if this is beneficial for their career?
    – user111388
    Commented Aug 7, 2019 at 17:02
  • 4
    @user111388: Because misleading the community about an invalid paper to maintain the credit for it is a grave offense. It's the retraction that doesn't/shouldn't matter. If you're saying that one can just misrepresent the paper to the hiring committee, I suppose that might be possible, but it's immoral and detrimental to the academic community in general.
    – einpoklum
    Commented Aug 7, 2019 at 17:47
  • 2
    I am agreeing that this is how it should be. Absolutely, no question. I am doubting that this happens in practice. I would like to see references to that to become more optimistic!
    – user111388
    Commented Aug 7, 2019 at 18:20
  • 1
    @user111388: TBH, I don't know. But there's more to academia than hiring committees.
    – einpoklum
    Commented Aug 7, 2019 at 19:21
  • 1
    I agree. I ask because I understand your answer as "it is not only good for academi, but also for your career".
    – user111388
    Commented Aug 7, 2019 at 19:27
9

Let me give you an answer that does not appeal to the grand ideals of science, like reproducibility and advancement. You are asking "what is in it for me", which I - as a person who spends most of my time polishing and publishing simulation code - think is a very fair question, as this is something that will take up a lot of your time. After all, when you publish your code you will have to deal with:

  • Making your code readable - you don't want other people to look at your spaghetti, so a fair amount of time must be invested.

  • Writing documentation - no code is good without it.

  • Tech-support - should you be so lucky that someone will use it, they will require support.

  • Update schedules - you will certainly update your code, and you must now also update your public code. Without breaking existing features. And be backwards compatible.

As you can see, this is something which can take an awful lot of your time. So is it worth it? Yes, I think so. Otherwise I would not do it. Publishing your code allows other people to actually use your work. In my case, I am doing theoretical work. Publishing code allows people doing experiments to download my code and compare my theory to their data, without needing me. That gathers citations and reputation - something a young scientist needs. It also means that, since they can toy with the code themselves, they do not need me to understand their specific setup, in order for me to provide a calculation suited for them. This makes the calculation more precise than what I could ever do myself - again without me having to do the work.

Now, what happens if someone discovers a mistake? This is of course unfortunate, and it happens. Usually it does not lead to retractions, as few 'code bug' type of mistakes invalidates a whole paper - if it did, you did not do a through enough job of checking your calculations against common sense before submitting! I have grown to like bug submissions. It means that someone else did the painstaking work of going through your code, and actually found something you did not find yourself. In the end it makes your results better, and without you needing to put in the effort.

All in all, it is a large investment to publish your code - but the returns can also be grand.

2
  • 2
    For any non-trivial code you should be doing the first two bullet points for your future self. They don't count as effort needed to publish the code. To that same category I would add writing a least a basic test suite. Commented Aug 8, 2019 at 21:50
  • 1
    Sure. But "should" does not mean that it is always done. There is heaps of semi-abondoned PhD-ware out there, which has formed the basis of papers.
    – nabla
    Commented Aug 9, 2019 at 19:38
5

If they really did find a bug that completely invalidates your results, then indeed the paper is invalid. You might as well retract it. In a perfect world you could then fix your code, analyze the correct results, and write a new paper - but in reality the journal may not be interested in another submission on this topic.

This is no different than if you measured a tower as 123 m tall, dropped some objects and concluded that g=2.9 m/s^2. If it was later discovered that you mixed up your rulers, and the tower is actually 123 ft tall, what do you suppose should happen to Experimental Evidence of Extreme Gravitational Anomaly in the Vicinity of Tall Manmade Structures?

It may seem like posting code on Github is bad because it creates a risk of such a disaster, with no corresponding upside. This is false. We can follow the analogy above - what if your experimental paper omitted the Methods section because "somebody might realize my methods are ill-suited and don't work"? What if you just didn't publish at all, because "there might be errors in the paper"?

  • If your code really is that wrong, it is better to retract ASAP. The longer your paper is out there, the longer Gravitational Anomalogists will read and debate it. It will gain visibility. Even if you try to hide the flaws, they will eventually be found, at least when somebody tries to reproduce your work. If you retract much later, there will be many more people who care about the paper and maybe even relied on it, who will now be pissed off. Your reputation could suffer much more from a day 5 retraction than day 500 retraction. Showing code is your friend here.
  • Without code, your paper is arguably irreproducible, hence it is not even Science. Showing code is your friend here also.
  • If you put your code up even before you publish, maybe after submitting a preprint, you can detect the errors early enough to prevent the retraction in the first place. Again, showing code is your friend.
  • If you have put your code up and it doesn't have a fatal flaw, people can be very confident that it is correct, because they can read it themselves. But if the code is not up, the only logical conclusion is that it may or may not be flawed. Nobody can trust your paper fully because they haven't seen the code. They may even assume you must have some flaw, because why else wouldn't you show your code? Showing code prevents all this.
  • If you make your code available, other people can write their own code based on your code, or even analyze their own data using your code directly. Then showing code is a very good, because you don't just get a publication, you get a citation.

Furthermore, there are ways of verifying your code. They're not perfect, but they can help catch a lot of bugs. Generally, you should aim to be careful and meticulous in your work so that there aren't huge flaws in it. You shouldn't publish things that have countless buried "surprises" just waiting for your fellow scholars to discover. So in reality, it shouldn't be that likely that your code is completely wrong. Therefore, by putting it on Github, you are risking very little (well, there's long tail) but you stand to gain a great deal.

7
  • 2
    I don’t understand how code only can prove anything. An algorithm or mathematical formula can be stated and analyzed on its own merits. Code that correctly implements the algorithm can’t reveal anything the algorithm doesn’t reveal, and code that incorrectly implements an algorithm cannot invalidate the algorithm.
    – WGroleau
    Commented Aug 7, 2019 at 14:34
  • 2
    @WGroleau here’s an example: in a recent paper, my coauthor and I proved a numerical bound for a certain geometric optimization problem. We developed a computer assisted proof scheme with an algorithm that produced a numerical bound that was rigorously certified as correct. Our paper gives a description of the algorithm that in practice could be implemented by anyone, but in practice it took us several months and nontrivial programming expertise to code. We put our code on github so anyone can check it’s bug free and works as advertised.
    – Dan Romik
    Commented Aug 8, 2019 at 1:07
  • 2
    ... If this had been someone else’s paper, without access to the code I would have close to zero confidence that the authors proved what they say they proved. The code really is essential, at least in this specific (pure math) context, and, I very strongly suspect, in many other contexts.
    – Dan Romik
    Commented Aug 8, 2019 at 1:10
  • @DanRomik Aren't most physics papers published lacking their code see for example org.ch.tum.de/glaser/94(GRAPE_JMR_05).pdf Commented Aug 24, 2019 at 9:25
  • 1
    @TejasShetty maybe - I have no idea if what you said is true or not (the algorithms described in the paper you linked to look like they can be implemented in a few lines of Matlab code, so I hardly think that’s a good example of what we’re talking about here). Anyway, assuming it‘s true, what is your point exactly?
    – Dan Romik
    Commented Aug 24, 2019 at 17:08
-1

I do not know much in your field. In computer science (e.g., machine learning), I think no one can ensure the code is 100% correct.

To my point of view, I think two most important things in research are (1) idea and (2) presentation. Once you can clearly convey your idea to your audiences, I think that is the good article, which has its own value. As such, that paper should not be retracted.

However, we need to try our best to ensure all the reported results are correct (to the best of your knowledge) before the submission.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .