23

During my PhD (field of machine learning) I've been working in collecting a very large dataset. The whole work took more or less 1 year. I had the idea, collected most of the data and wrote all the software for data collection and data annotation. I acknowledge I had help from other people setting up sensors and collecting data, but mostly technical work.

Recently I've found out that my supervisor is giving this dataset to his other students without my consent and even submitted 2 papers (even before the submission of my own paper, so I don't even get a citation).

In those other papers, they wrote my name in the acknowledgements. I didn't find it fair and I've claimed at least co-authorship from the other papers. But my supervisor claims that he purchased the equipment, and the funding came from his project, so he owns the dataset and he decides what to do with it.

I took 1 year doing everything, and other students spent 0 hours, and I just get an acknowledgement? I really didn't find it fair.

I'm trying to solve it internally with my supervisor, but it's not working. Since the papers were already submitted, I'm not sure I will be able to change authorship. I have many proofs that this dataset was mainly developed by me.

What's the best way to try to solve this problem? Report it to the department? Address it to the conference chairs (the papers are still under review)? Or just forget about it and finish my PhD without complain?

7
  • 4
    I don't think what your supervisor did is smart (to be polite here), but legally they are probably correct. Thus, I don't think reporting this will get you far. I would try to negotiate with the supervisor, i.e., there might be something these other students could do for you. If that doesn't work out, I would keep my mouth shut until I've finished my PhD. But I would certainly not forget about it ...
    – user9482
    Commented Jun 25, 2020 at 6:57
  • 26
    Is there a possibility of making the data public? If yes, write a paper on just the "Data" and publish it with your supervisor. Then all papers can cite that data paper. There is no fun fighting with your sup during PhD.
    – Coder
    Commented Jun 25, 2020 at 7:33
  • Possibly a duplicate or at least relevant: academia.stackexchange.com/q/116307/72855
    – Solar Mike
    Commented Jun 25, 2020 at 8:01
  • 3
    @Coder Half of my paper has a description about the dataset, and I'm trying to release the dataset. We need approval from our sponsor, before I release. My paper is under review at the moment and hopefully it will get accepted and the other papers can at least cite my paper before they are actually published. Commented Jun 25, 2020 at 9:37
  • 8
    I kind of disagree with all the answers so far. If a student put a year of work into something and then there is a publication based on that work, then obviously the student should be an author on that paper. There just aren't any ifs or buts about it. To submit a paper without them is clearly unethical.
    – N. Virgo
    Commented Jun 26, 2020 at 10:14

4 Answers 4

17

Fighting with your supervisor is probably going to negatively affect your own career. Don't let your future hang on this one thing whether it is fair or not. Forcing punishment on the advisor will not get you the letters and recommendations you need to get out and on your own.

Get your own paper published. "Make nice" enough that you get good recommendations. Get away. Build your career.

If you focus just on the "justice" of it you could easily be the one to suffer blowback. Let the past be the past and optimize for your own future. This paper and this dataset isn't going to define your future, nor, hopefully, be the best work of your career.

And, no, I don't make this recommendation happily nor lightly.

1
  • 1
    ... to negatively affect your own career It depends on the career. It will certainly make their life miserable/complicated/stressful until they can hang the PhD diploma on the wall. If after that they plan to leave academia nobody cares (including recommendations).
    – WoJ
    Commented Jun 25, 2020 at 17:24
24

I don't see anything problematic here, unless the papers that your advisor wrote with the other students take credit for developing the dataset (which would be a clear ethical violation).

Many papers in computer science use one or several datasets developed by other authors, and it's not a standard practice that the dataset developers are invited to contribute as authors. In fact, doing so may lead to vastly inflated author lists for papers that use many datasets.

7
  • 1
    Agreed, but they probably should cite his paper about the dataset, even if this is only to explain how the data was obtained.
    – Louic
    Commented Jun 25, 2020 at 13:30
  • 5
    @Louic Agreed. Of course, for this, the paper first has to exist in a citable form. OP should push the sponsor for the OK to release the dataset (which seems to be needed), and then publish the paper on arXiv, so that it becomes citable. Commented Jun 25, 2020 at 13:38
  • 1
    It is normal to use many other datasets without the developer as author, agreed. But this happens when the dataset is publicly available, no? My dataset is not publicly available and I didn't even get a citation. In their paper they say that they applied the method in their "own dataset", but they give acknowledgments to me for the data collection. At least I would expect that my supervisor would ask me if I agree that other people would be using it. Commented Jun 25, 2020 at 14:15
  • 7
    @user1998012 There's also cases where the dataset is not publicly available, like, when working with an industry partner. I would agree that your supervisor's behavior lacks courtesy, but it's not unfair, since the creation of a dataset usually does not lead to authorship (unless the dataset is a contribution of the paper). Commented Jun 25, 2020 at 14:22
  • 3
    I am not in computer science (so comment and no vote), but in my field, the OP would have deserved authorship for at least the first paper (or set of papers). An FYI for future readers. Commented Jun 25, 2020 at 20:10
9

It should be considered positively if the dataset is used by completely somebody else. I remember that number of acknowledgments can be used as a metric in some grant application or other types of report.

If you prefer citation or authorship, then I'd recommend to make your dataset citable as soon as possible, e.g. as your own publication, article or as pure dataset. Public dataset can be uploaded and cited by services like Zenodo, which generates citable DOI.

Your career won't be affected by on less co-authorship. More important is your attitude to work. Personally, I'd be very happy if some of my datasets are used by my colleagues, anyway I do not know exactly your circumstances and lab relationships.

6

Unpopular opinion: be happy that your dataset was used for, not one, but two publications!

You'll get your paper out, then at some point you may write

"This dataset was used in the following publications \cite{otherstudent1, otherstudent2}."

So that people will know that your contribution was important. That's kind of counterintuitive because you are citing them. But people reading your paper will understand the usefulness of your work. And that's more important than bibliometrics.

I understand that having two more papers in your CV will positively impact your career. But the gain of two more publications is greatly outweighed by the loss of a bad recommendation letter.

My advice is to get the paper out, focus on the fact that your research was useful (that's our goal at the end of the day), and move on.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .