19

In Computer Science, sometimes we're assigned to reproduce the results presented by a paper that already have been published. In many cases (most of them) the code is not available, making the task more difficult.

But this questions is related to the cases where the code is available, however, many things author do in his code are not described in the published paper.

  • Is this ethical?

  • How to proceed when many things presented in the code are not explained in the original manuscript?

  • What if the code disappear from author's page? Will the results exist in the universe once again?

Disclaimer

  • This is not about minor details in the code, but huge holes of dark matter.

  • Let's assume that we're talking about Journal papers with more than 10 pages.

15
  • 22
    Blame journal page and word count limits, which make describing everything that the software does completely impossible.
    – Ben Voigt
    Commented Dec 6, 2015 at 20:30
  • 3
    Let's blame the journals and make them full of things that people will never be able to reproduce.
    – Eliezer
    Commented Dec 6, 2015 at 20:32
  • 4
    How can one as the reader of the paper be certain that the "huge holes of dark matter" aren't, in fact, perceived as trivial details by the authors, or details that are largely outweighed by other contributions? I think there is a matter of perspective that one should take into account. I often completely disagree with an author about the main contribution of her/his paper.
    – user38309
    Commented Dec 7, 2015 at 0:01
  • 3
    Maybe CS is different, but I thought the paper is the primary contribution, and the code only exists to justify the results in the paper. So IMO the important question should be, does the code produce the results that the paper claims it does? It seems like you're asking something different, namely whether the paper completely explains the code.
    – David Z
    Commented Dec 7, 2015 at 7:08
  • 2
    @SylvainPeyronnet Because very often the code specifies something slightly different than the authors thought it did or intended it to (logical bugs are a thing). That means the binary may be a result that was unexpected and generally, but not always, meets some expectation that is different than expected. This is pretty much every logical bug ever. Code documents intent whereas the binary is only a raw implementation of one outcome of one compilation on one platform. This doesn't even touch on incidental differences in binaries, such as differences between compilers.
    – zxq9
    Commented Dec 22, 2015 at 18:23

5 Answers 5

27

As a researcher, I'm generally very keen on including as many implementation details as possible. However:

  1. There is rarely the room available in a paper to describe every detail of an implementation.
  2. Describing every detail of an implementation in prose often takes substantially more work than writing the program in the first place. The meaning of code (which is written in a formal language) is generally unambiguous; the meaning of prose (which is written in natural language) can often be ambiguous. Resolving the ambiguities in the prose takes time, effort and care -- it's often harder than writing the code.
  3. Researchers are generally both very busy, and under a lot of pressure from above to produce quick results and move onto new projects. Resisting this pressure to bring existing work to a tidy conclusion is a good thing, but there are limits to how long you can spend doing this, and sadly the reward structure in academia provides little incentive to do it.

There is a reasonable case to be made that implementation is not valued in academia to the extent that it should be. However, even if it were appropriately valued, and even if there were no page limits for papers, there would still be an opportunity cost for individual researchers when it came to producing copious documentation describing the implementation details of their code. Human lifetimes are finite: the time spent describing every detail of old code could alternatively be spent coming up with and implementing new ideas. Some old code is so useful and valuable that it's worth spending a lot of time describing, but that's certainly not true in all cases. A lot of code (particularly hacked-together research code) just isn't that valuable in the long run: time spent describing it could be better spent on something else.

To directly answer your questions, then:

  • Yes, it's ethical to not describe every detail of your implementation -- most of the time, you can't, and of the times when you can, a decent proportion of the time it's not the best use of your time. Having said that, I'm of the school of thought that says you should at least make a reasonable effort to write good, clearly-documented code, to publish it on the web for all time (e.g. on GitHub) and to at least describe how the method works in a way that will enable someone reading the code to know what's going on.

  • If things in the code aren't in the original manuscript, start by reading and running the code to see if you can figure out what's going on. If it's not clear from that, do some reading around to see if other sources can give you any hints. If all else fails, and it's crucial that you understand the details, email the authors. If they ignore you, give up on their method and use something else.

  • The code can disappear from the author's page for a variety of reasons -- maybe they stopped paying for hosting, for example. If you need the code in that situation, email them. As mentioned above, good practice involves putting your code in a public repository so that it will hang around, but it's not unethical if you don't (just unhelpful).

2
  • 1
    I couldn't agree more in the area of implementation details. I couldn't tell you the number of times that I have read an academic paper (or part of a textbook/etc) where it seems like the actual algorithm in the algorithm is left as "an exercise for the reader"
    – CobaltHex
    Commented Dec 8, 2015 at 3:42
  • Thank you for the answer! I'll mark it as the best answer. I'm very happy with all the insights given by this topic.
    – Eliezer
    Commented Dec 8, 2015 at 15:22
6

Is this ethical?

I don't see this as a question of ethics. Is it unethical to write a bad or not useful paper? No, it's simply the best that some people are capable of (or interested in) achieving. By not including implementation details, the paper's authors are clearly limiting the usefulness and future impact of their papers. Nonetheless, the paper may still have some usefulness, so I don't see why anyone would think publishing such a paper is unethical. The only exception is if the authors are dishonest about their claims of what the algorithm can do or how well it works, and are deliberately hiding implementation details to prevent their dishonesty from being exposed. Obviously that would be unethical.

How to proceed when many things presented in the code are not explained in the original manuscript?

Do your best to fill in the holes. If it seems too difficult or too much work to reconstruct the missing implementation details yourself, talk to your professor -- he or she may not realize that they've given you an unreasonably difficult assignment, and it would be their job to figure out how you should proceed.

What if the code disappear from author's page? Will the results exist in the universe once again?

Again, it's up to the author to decide how much work they want to invest in ensuring their paper has a long life and makes a meaningful contribution. It's also up to the journal editors and reviewers to enforce some minimal standards. But at the end of the day, if the author doesn't care enough about their work making an impact to deposit their code in a repository that would outlast their personal web page, that's really their decision, but it doesn't reflect well on them, and potentially limits the cumulative usefulness of their work to the community.

To summarize, what we are seeing here is an example of short-termism, which is something you see in all walks of life. Some authors write a paper just with the goal of getting the short-term reward of getting their paper published (and the attendant professional rewards the academic environment will give them for such a publication: jobs, promotions etc.) and will just put in the minimum amount of work to achieve that immediate goal. Others care much more about the long-term impact their work will have on the research community, which will also eventually translate to a personal benefit to them since they will develop a reputation as better researchers. It is the researchers who belong to the latter group whom you usually hear about as the famous, super-successful authorities in the field whom everyone admires, so I certainly recommend trying to follow that approach yourself.

2
  • " Is it ethical to write a bad or not useful paper? No..." I think you want one more, or one less, negative there somewhere. Commented Dec 6, 2015 at 22:27
  • Got it, and fixed, sorry for the carelessness.
    – Dan Romik
    Commented Dec 6, 2015 at 22:28
6

Short answer: You are at the cross road situation of blaming the game, the player, or neither. Show the paper/journal to your supervisor or someone in a senior position if you can; to help you with your decision.

Longer answer: It all comes down to one thing: whether writing a piece of code is a contribution or not. Well, it is actually a grey area. I would judge a computer science paper/journal, with code, like the following:

  1. Know the Context and the Promises: What is the context? Is it just a tool that demonstrates something? Or a language that promises that it does wonders? If it is heavily code oriented, then the code should be presented clearly in the publication.

  2. Semantical Backup: Do the author(s) prove their promises with semantics? How deep do they go into their semantics?

  3. Clean Links Between Semantics and Programs: Do they clearly demonstrate the relation between the semantics and the program? If not, it is a red flag in my opinion. Something is lacking here and the authors are hiding something.

  4. Track Records of The Authors: You can also simply look at their records as well, see what they did achieve so far; and base some judgement on this as well.

  5. Some extra work: You could also look at their code as well, if they provided a link. It is not hard to compile any mainstream languages, most likely a one line command will compile the code and you can see for yourself. You can even copy/paste part of their code into a search engine box, and take it from there.

Conclusion: You see all these points are not hard hitting rules that help you with judging an implementation part of the paper. It is the best to get a helping hand by a senior academic (e.g., supervisor) to look into it to help you with your decision.

6
  • 2
    It can be extremely hard to build code, depending on things like the language, what dependencies are needed, and how the build has been set up. Research code is actually often quite hard to build, because little effort is put into setting the build up properly (often it's only easy to build on the machine of the original researcher). Commented Dec 6, 2015 at 22:13
  • @StuartGolodetz I know that, however this should not be an excuse for the author of the paper to provide a crappy code that can not be run. They can easily contain all their code and all their dependencies into one zip and write a bash for it or to use third-party containers. Again, this is a problem in CS domain and some people unfortunately take an advantage of it. This is a grey area that you need to take a decision, for the faith of the paper, with a senior lecturer.
    – o-0
    Commented Dec 6, 2015 at 23:21
  • 2
    @StuartGolodetz Also... look at big main stream languages/operating system that are open sourced which you can build them up from the source with all their dependencies. In my opinion (that might be extreme to some academics), if the code can not be run or demonstrated, then the researchers did not their job. If they want to publish a paper on just theory then thats another ball game.
    – o-0
    Commented Dec 6, 2015 at 23:26
  • Most computer scientists can't write quality software, because they don't have the training or the experience for it. Even worse, research code is usually written by the least experienced members of the team. By the time people start learning best practices, they usually advance to higher-level stuff, and the programming tasks are assigned to new students. Commented Dec 7, 2015 at 20:57
  • 1
    @DaveRose: I was primarily arguing that your assertion that "it is not hard to compile any mainstream languages" is by no means always true. In general, setting up cross-platform builds that work reliably on all setups (including those to which you may not have access) is not trivial, even if you have a reasonable amount of experience. Many researchers (particularly early-stage ones) don't have that experience, and their code can be hard to build as a result. Commented Dec 7, 2015 at 23:51
5

No. It's not ethical.

If we stick to Eliezer's scenario, I cannot trust a paper that posits wonderful things, but fails to at least outline how they were accomplished. I've read several papers extensively detailing mathematical models for stuff, and the enormously successful results achieved thereafter, but left the code implementation out. Obviously I can't confirm the non existence of associated code. It might have existed once, somewhere.

There is a very well known web site in my sad sphere of interest that is held in high esteem by all. Except me. It features detailed information of what it does, and published analysis of it's output (even some in real time). It doesn't publish any actual code of how it does it. Hmmm.

Facetious example: I've just invented true Artificial Intelligence. Hussar! Here's the kit I did it on. Here's the conversation I had with it that passed the Turing Test. The (easily understood Java) code I wrote is on my web site for you to read. Including the bit in (opaque) compiled machine code that I happened to use. Do you believe me? People believed Volkswagen.

3
  • 1
    Do you believe me? I'll try to believe you if you delete your another answer which is absolutely non-sense.
    – Nobody
    Commented Dec 7, 2015 at 13:45
  • Paul, I think you are confusing between not ethical and not useful. Ironically, that makes your answer not useful (although it is not unethical), illustrating exactly the same distinction.
    – Dan Romik
    Commented Dec 7, 2015 at 17:32
  • 2
    Clarification to my previous comment: your "facetious example" does actually represent unethical behavior, since in that case I assume (based on your reference to Volkswagen) that you are referring to a researcher who fraudulently claims to have developed AI code that passes the Turing test when in fact he hasn't. I agree that is clearly unethical, and said the same thing in my own answer. However, if you did in fact invent true AI and didn't provide full human-readable code or documentation (which seems closer to the OP's scenario), then again that is not unethical, merely unhelpful.
    – Dan Romik
    Commented Dec 7, 2015 at 22:02
2

I think it's entirely understandable that details of a code example will be undocumented. Let's say, for example, that the paper is about exception propagation in an instruction stack. In the example code, the author does some tricky little obfuscated sorting of an array, which will be iterated over to produce an example exception. Is it relevant to explain that little sorting routine? No...it's just some example code that gets you to the meat of the problem.

2
  • 1
    This answer isn't really consistent with the OP's description of "huge holes of dark matter".
    – Dan Romik
    Commented Dec 6, 2015 at 22:27
  • @DanRomik...let me introduce you to my dear friend, "rampant hyperbole." I've never met a new coder looking to refactor existing code that didn't declare the thing to be a "rat's nest of ungodly spaghetti."
    – dwoz
    Commented Dec 9, 2015 at 1:07

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .