I just realized that my published papers are based on a buggy version of my dataset (due to a SQL error). How to proceed? How to tell my advisor?

Question

I am in an advanced stage of my phd, finishing next year. I have 2-first author publications in which I used data from my group. My research group has different datasets stored in MySQL server, in which each researcher accessed their relevant dataset.

In my 2 previous publications, I exported my dataset into a csv file and worked with it. During this data export, I had to perform a couple of table join operations, involving filtered data in a kind of "summary" table, joining it with a table containing original raw data.

Last week, I was working on my third review paper, and I needed to get some statistics from the MySQL server, when I discovered that the dataset I used in my previous research was actually incorrect. Incorrect in the sense that I had an error in my data filter SQL query (used to export it), therefore the exported dataset was incomplete, kind of a "subset" of the actual data.

All my conclusions in previous work were drawn from this incorrect data. I am in panic, and don't know what to do. I am not sure what the reactions of my supervisor would be, if I explain this problem to him. He is such a person that expects "near perfect" from his students.

I am now sure how this will affect my PhD. The years and efforts invested are likely in vain. What is the best course of action?

Highly related: D.V.M. Bishop (2018), Fallibility in Science: Responding to Errors in the Work of Oneself and Others — Eike P., Commented Jul 5, 2022 at 11:24
" I am not sure what the reactions of my supervisor " don't worry: the longer you wait to discover the reaction, the worse it will be. It is in the human error to make mistakes. Don't worry about your supervisor reactions, but discuss with them the next urgent steps. — EarlGrey, Commented Jul 5, 2022 at 12:37
Bad news doesn't get better with age. Neither you or your advisor can change what happened, but you can control what steps you take next. Don't beat yourself up. These things happen. Be honest and up front, fix it, and move on. — Andrew, Commented Jul 5, 2022 at 12:39
Are there significant differences in the results when you re-run your analysis on the correct data? — Issel, Commented Jul 6, 2022 at 12:14

cag51 · Accepted Answer · 2022-07-04 22:24:43Z

59

Oh no! This is bad news for sure, but try not to panic.

The main thing is to rerun your analysis with the corrected dataset. Hopefully, you have been organizing your work in such a way that you can just replace the csv file, push the button, and have results quickly. If not, this is a good opportunity to improve your old code such that you can quickly reproduce the old numbers and then produce the new numbers. If possible, I would try to do this quietly (i.e., before telling others of the potential problem).

Once you have these results, you'll be in a better position to talk to your advisor. If you are lucky, your results will be the same (or even better, since you have more data) and it is just a matter of updating the paper without changing your conclusions. This shouldn't be too much of a problem. You should still tell your advisor, as a corrigendum may be required, but his anger should be limited.

You should also consider approaching this from the other side -- what claims did you make in the paper about your dataset? If you claim to have used, say, ImageNet, then there is not much wiggle room. But if you claim to have used "500 original datapoints from a medical database," then you might be able to simply replace this with "300 original datapoints from a medical database" and there is no need to change the papers' results or conclusions.

Of course, it's also possible that your bug biased the dataset in such a way that your conclusions are no longer valid. This would put you in a much more serious situation for which there is not much advice to give -- all you can really do is tell you advisor, show him the before/after numbers, and hope that his reaction is reasonable. The way forward will depend heavily on the technical details of your paper, so only your advisor's advice is really valuable in this case. Good luck.

answered Jul 4, 2022 at 22:24

cag51♦

70.5k26 gold badges190 silver badges260 bronze badges

22

This is a good answer, but I would suggest involving the supervisor at an early stage for two reasons: (1) Their name is on the paper, they deserve to know as soon as you find out, and (2) One of the reasons the mistake got published is because you were working alone. Having someone to work with may help prevent this kind of mistakes. Most likely your supervisor will be very disappointed, but they will understand: everybody makes mistakes.
– Louic
Commented Jul 5, 2022 at 8:58
34

@Louic - I'm not sure I agree with that as long as getting results from the new data is fast (less than a day or two). Simply because going to someone with solutions/complete information is far nicer for both parties than going to them with a problem. If it's going to take longer than that, then I agree with you wholeheartedly
– ScottishTapWater
Commented Jul 5, 2022 at 14:05
4

@ScottishTapWater agreed
– Louic
Commented Jul 5, 2022 at 14:33
10

Agreed: unless (s)he's the type with explosive personality disorder who will physically kick you out of their office before you finish the sentence, the next thing (s)he is guaranteed to say is "so, how does this error affect the outcome or conclusion?" If you already have that answer, you can proceed and get their advice more quickly and, probably, less annoyance on their part.
– Jeffiekins
Commented Jul 5, 2022 at 19:29
8

If the data can be corrected within a week, I agree with not telling your advisor until it's fixed. That's a short time scale, and the risk of your PI catastrophizing the situation is non-negligible, when it could all end up being super-fixable and not a big deal. Head down, fix the data quickly, then you are better able to judge what to do. If there are still issues a week from now, then go to your advisor.
– Jerome
Commented Jul 5, 2022 at 20:36

| Show 3 more comments

High GPA · Accepted Answer · 2023-08-03 08:33:44Z

Don't worry, everyone makes mistakes. Identifying and correcting mistakes is a crucial component of academic research and developing new theories.

From my experience, as long as you correct the mistake in a honest way, the journal will not retract your paper and your PhD will be unaffected.

Most papers are force-retracted because of ethical problems or political problems. No paper from my knowledge is forced to be retracted due to honest mistake (unless you require it)
If mistakes are identified, the best scenario is that you correct it by yourself. The second best scenario is someone else find it first and correct it for you.
Even if someone else find your mistakes, your paper will usually not be retracted. For example, see this example in the most rigorous journal in economic statistics: https://www.econjobrumors.com/topic/kasy-and-sautmann-econometrica-2021-proven-wrong/page/16

Here is what I would do:

I would use the corrected dataset to rerun all the analysis and start writing a corrigendum or addendum.
For some results that still hold, I would write that those conclusions are verified with the new dataset.
If some new results are different from the old results, I would update the new version of the results in paper, explain why the new results are correct in this scenario, and write a short discussion on under which scenarios would the old results likely to hold and under which scenarios would the new results are likely to hold.
Finally, you can post your addendum/corrigendum online and send it to the same journal.

In my opinion, (4) should read "Next, you send the editor of the journal your corrigendum and await their decision. Publishing the corrigendum online is not enough – the correction needs to be as available as the publication that contains the faulty analysis." Or perhaps there should be a (0): "Contact the editor, and tell them about the issue. Suggest to write a corrigendum." — Schmuddi, Commented Jul 5, 2022 at 7:43

asdfafasdff · Accepted Answer · 2022-07-06 04:58:09Z

First of all, don't panic.

This is not necessarily a fatal flaw. You said that the analysis was done on a subset of the data. Rerun the analysis on the full data set. If the conclusions still hold and follow from the hypothesis, your work will not be substantially affected. You can revise the paper even after its been published. Mistakes are happening in science. This happens particularly in the more difficult fields like theoretical math, where the results are very difficult to understand (even by experts).

The only way you are in trouble is if the conclusion is changed.

In any case, its better to find the mistake yourself than to have someone else find it and rub your face in it.

Regardless, you are going to have to tell your advisor. These things happen though. You aren't the first person to have this happen to them. At least you are still in your PhD program. You can at least fix everything before publishing your thesis.

You need to go about this the right way. The worst thing you could do right now is to go into a full blown panic and start calling everyone you know, including the journal and telling them you made this terrible mistake. Take some time and understand the mistake. Try to understand if it can easily be fixed or if the entire paper is going to change. Once you do that, then go to your advisor first with the complete analysis of the situation. Due to the highly political and sensitive nature of the academic world, you need to handle this correctly and carefully. If this is a minor error, it can likely be corrected with a simple revision and it should not be blown out of proportion. I would recommend going through the appropriate channels and deferring to those that have more research experience on how to correct this mistake correctly.

This is really good advice. Just to amplify something said here: in addition to "don't beat yourself up" (which others have said), make sure you don't try and convict yourself in front of your peers/mentors/colleagues. Often, people will respond to the energy with which you present a piece of information. If you present the information as a terrible mistake that you should have seen if you were competent, people will tend to believe you. If you present it as a minor oversight that does not affect the main conclusions, people will also tend to believe you. Don't sell yourself short to others. — Andrew, Commented Jul 6, 2022 at 23:00

Stack Exchange Network

I just realized that my published papers are based on a buggy version of my dataset (due to a SQL error). How to proceed? How to tell my advisor?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
publications
phd
research-process
advisor
data
.

Hot Network Questions

I just realized that my published papers are based on a buggy version of my dataset (due to a SQL error). How to proceed? How to tell my advisor?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged publicationsphdresearch-processadvisordata.

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
publications
phd
research-process
advisor
data
.