6

I'm doing my PhD in computer science and am working in computer vision. I have come up with an algorithm that my supervisors consider to be promising and publishable. It took me a long time to get these results.

Now the thing is that I have to compare my code to other recent papers tackling the same issue. Which I understand is very necessary as to show how my work compares to previous work. My supervisors want me to compare against at least 4 or 5 other recent papers.

The problem is that the database I am using is very recent and no journal papers have till yet used it. So the other solution left for me is to read the journal papers, understand it and try to implement their code.

This will definitely take way too long and in my opinion waste a lot of time. These journal papers are very advanced (obviously) and implementing their results on my own will take up a lot of my time whose only sole purpose is to get a result.

One solution would be to email the authors and ask politely for their source code, but I found out that many authors don't reply.

This question Can I request the code behind a research paper from the author? stated that they were more likely to get a response if they promised to add that author as a co-author in their paper. I do not want to do that as that would be being dishonest as I see no sense in adding the author to my paper just because I compared my work against his, and if I am to compare against 5 papers then that will be a very long list of authors.

Maybe I am asking for something that is frowned upon. Or maybe not asking in the right manner?

I emailed the authors and asked for their code solely to test their work on my database for comparison purposes, which gets no replies. Am I doing something wrong?

10
  • 4
    "These journal papers are very advanced (obviously) and implementing their results on my own will take up a lot of my time whose only sole purpose is to get a result." - And you're saying that while you may be getting a publication out of it. While I do see ethical concerns with refusing to share any source code required to reproduce experiments, and not replying at all certainly goes into rude territory, assuming that you asked with a reasonable level of politeness, from the point of view of those other authors, checking whether they can give you such source code will take up ... Commented Apr 27, 2015 at 16:44
  • 7
    I assume that each of those 5 papers reported the database they tested with, and the results they obtained. Can you get those databases? If so, perhaps you could run your code against each database, and compare your results to the results they reported. It might be easier than trying to replicate their code or even get their code running. The ideal situation would be if all 5 papers used the same database; then you only need to do one run.
    – mhwombat
    Commented Apr 27, 2015 at 16:45
  • 2
    Side comment: this kind of issues is one reason why journals should follow the lead taken by Ipol ipol.im, a journal where papers are published alongside with their code implemented in a unified language, enabling readers to test and compare extensively. Commented Apr 28, 2015 at 7:13
  • 1
    @O.R.Mapper: I meant that all codes are asked to be written in a given, fixed programming language, so that it is implemented in the same way for all papers. This is probably more for the convenience of the publisher, who runs all published algorithms in it plate-form (you can go now and test any algorithm of any papers published by Ipol on any of the images on your hard drive), but it also make comparisons more legitimate. Commented Apr 28, 2015 at 12:21
  • 1
    I'm had similar experiences, so I can relate to your frustration. One thing to bear in mind is that the authors may not have the source code available in any usable shape. One common scenario is that a junior member of the team did all the actual coding. And it's also relatively likely that that person now no longer works in academia, no longer has access to his code, and has little memory or interest in his former life. Depending on how desperate you are, you could try writing paper letters and sending them registered - they're a bit less likely to be ignored. Commented Feb 25, 2018 at 3:46

2 Answers 2

15

I'm doing my PhD in computer science This will definitely take way too long and in my opinion waste a lot of time

Well, that is what a PHD in CS in experimental algorithms and areas is all about. You must prepare your algorithm, implement it, implement previous works and compare your work with them. So, how much time it will take you it is of no interest to anyone else but you and your supervisor. So, this line of argument is naive.

One solution would be to email the authors and ask politely for their source code

Yes, but it not the only one. You could ask for their datasets and run your algorithm on their datasets, instead of making yet another dataset. Moreover, experimental algorithm communities have well-known benchmark instances and all related papers work on them for easy comparison of results. Why do you need to build yet another dataset? It is OK to use this extra dataset AFTER you tested your algorithm on those community benchmark instances. You could also send your datasets to your "competitors" and ask them to run their experiments on your datasets and just give you their results. Prepare you experiments relatively to older papers, notify them of your PC specs (and provide alternatives PC configuration - you probably have different PCs on your lab) and tell them to repeat all your suggested experiments on an PC close to your suggested specs.

If you like so much to share code, you can also send them the source code of yours, provide explanation how to compile it and use it and then let them run experiments on their PCs and give you the relative results to their work. I know that this thought probably never crossed your mind. Why? "They might steal my work, how do I know they will give me correct results, it is too much work to do so, I don't trust them with my src code". And now you know why people do not want to share their code.

But you also probably forgot the most easy way out of your problem. Let your supervisor contact the first author AND the rest of the authors. Unless you are an exceptional PHD student with many amazing papers you are practically Mr Nobody and people will easily brush off your requests. It is harder to do so to your supervisor (unless he is Mr Nobody as well). Usually people do not want to say NO to future reviewers, collaborators and respected members of the community. Also it is important to CC all paper authors. The first author (PHD student) might be protective of his code and hide your request from his supervisor. If you CC the supervisor, he might be forced to share his code or at least reply.

Last but not least: Be nice when asking.

3
  • Thank you for your excellent answer. Yes, wastage of time comment was from my point of view. As I will have to implement their work that might take weeks or months just to get a result for comparison. This I wanted to avoid and that is why emailed the authors... Asking to send my dataset to them to test is a great idea, I will add this as an option in my future emails... Yes, I am paranoid about sharing my code as mine is unpublished, whereas theirs is published so I cannot steal their work which is already published... And yes, I will ask my supervisor. Thanks again for your detailed reply. Commented Apr 27, 2015 at 18:38
  • 3
    Note that sometimes implementing other people's ideas is beneficial, because you can then easily combine their approach with yours. In this sense, this initial "waste" of time can provide several benefits later on. If you manage to be the guy that actually knows how to implement all (or most important) competing methods gives you a serious headstart later on, when other people try to catch up on you.
    – Alexandros
    Commented Apr 27, 2015 at 18:50
  • <<"I don't trust them with my src code". And now you know why people do not want to share their code.>> That's kind of a weird concern given their papers are already published. Commented Jan 28, 2017 at 10:06
3

Reproducibility in computer science is an important area for improvement. It is reasonable to expect authors to to make source code available for algorithms that they present. I've heard that some journals and conferences are putting pressure on authors to make source code publicly available (although I haven't personally encountered that).

In my experience, most scientists do make their source code available. Not doing so both casts serious doubt on the legitimacy of their result, and is counter to the normal premise of public-sphere science. It's also pretty tough to to justify, since sharing source code has no cost associated to it.

Being a PhD in computer science is most assuredly not about re-implementing existing algorithms, except maybe for pedagogical reasons. You're right that it's a waste of time. There is simply no good reason to do it when the code could be hosted on github for free!

A good strategy is as follows.

  • Start with the corresponding author, or first author. Tell them what you're doing, that you intend to cite their work, and that you'd like to benchmark your method against theirs. Do be cordial, and feel free to tell them what you like about their work. Most likely they're a cool person.
  • If they don't respond after a reasonable delay, say a week, send essentially the same email to the entire authors list, explain that you're having some trouble getting in contact with the corresponding author, and inquire about who is in the position to speak on behalf of the group. This has yet to fail for me.
  • They should provide their source code at this point. If they don't, it's fair game to inquire as to why not. Since it's published work, there's sort of a premise that they should make it available except if there's something really making that a problem. It's pretty fishy if they don't. You can also remind them that you wouldn't want to make any mistake while implementing their work, so that the comparison is legitimate.

Don't worry about being a "nobody". The idea that a researcher should only respond to high-status people is just ridiculous. What you do need to be aware of, however, is that a prominent researcher can receive a lot of incoming communication. If they're professional, they'll have set up triaging for that. The best thing to do is make it easy for them to respond, and they will probably be pleased to help you, a student, which they once were.

1
  • Sometimes they are not happy to give source code (patenting/copyright reasons, or because of "dirty laundry code"), but ok with providing a binary. Offer that as an option. Commented Jan 28, 2017 at 10:44

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .