13

If another PhD student created a dataset, can I use it for my own PhD thesis, with permission and attribution, and not collect any data myself? Or it is usually expected that a PhD student should collect at least some data?

2
  • 3
    Ask your advisor. He/she will know the answer in your specific case much better than anyone here.
    – GEdgar
    Commented Dec 30, 2022 at 0:59
  • 4
    I'm sure this is field-specific: neither myself, nor any of my fellow statistics PhD students came anywhere close to collecting our own data :) Commented Dec 30, 2022 at 17:03

5 Answers 5

9

You should be fine not collecting the data and instead use widely used data if there is no regulation against it (which generally does not). Moreover, using others' data has a good side that might increase the impact of your work.

First, people in the field will know right away what to expect. For example, in the digit recognition work, using MNIST data would be the no-brainer choice for this task as people in the CS would have known it by heart. Or if it is an RNA sequence, you might not need to do sequencing again if it is on NCBI.

Second, using the already existing data would benefit you in benchmarking your method. This includes the performance of your method and the correctness of your work. Now, given that you are using another Ph.D. data and assume that it is not yet published or it is not a well-known dataset. If your results are in the same direction, that will tremendously increase your work and another Ph.D.'s work impact.

That said, you might need to look into the purpose of the data you are using and whether it will take the crucial part of your work. Nevertheless, please review with your advisor and university to get a more decisive answer on whether the data inclusion is ok.

3
  • Indeed, using public data is convenient for comparing your result with others'. DNN papers seem to be fond of public data.
    – Michael
    Commented Dec 30, 2022 at 2:14
  • Good answer, though your MNIST digit example is a bit out of date, that challenge problem is pretty much solved.
    – cag51
    Commented Dec 31, 2022 at 0:37
  • @cag51 Look at it another way - it is a timeless classic now :)
    – Lodinn
    Commented Jan 3, 2023 at 16:06
25

Yes, in principle there is no issue with not generating your own data. The general requirement is that a PhD provides an advancement of knowledge in a particular field, not necessarily generate novel data.

However this is likely to be subject-dependent. My PhD was based on developing a new statistical method to analyse data, and for that I applied to data other people had collected, not data I had generated. However, if your field was experimental plant sciences, I imagine there may be an expectation (but not necessarily requirement) to collect your own data.

1
  • 1
    Yes, my field is Stats&ML, and I tend to use readily available datasets as well (other than artificial data)
    – Neuchâtel
    Commented Dec 29, 2022 at 16:24
11

There is no ethical problem with using data from others with permission and attribution. However, what is acceptable in a dissertation is up to your advisor and your institution.

Since one major purpose of a PhD is to give you training and experience in research, some will expect that collecting the data yourself might be required. There are problems that can arise in collection that you avoid by using the data of others in some cases.

Work this out with your advisor if the question isn't purely theoretical. The research question itself might be important enough that the source of the data is less of an issue. There are lots of possibilities.

0

Yes. Also: depends on your PhD.

Whole areas of physics rely on data retrieved by someone else (or layers of "someone elses"). Particle physics is one of them.

I used the experimental data of other people because I am not interested in the experimental part and they were interested in the theoretical / analysis one. We made a few papers together.

I wrote a paper after asking someone else for their data (they published an article based on them) because I wanted to analyze them differently. That person did not want to be a co-author but was happy to share their data (and more).

I gave the data I received to another PhD student so that she does not lose time gathering more.

All of this, of course, assumes that the exchanges and relationships are clearly acknowledged (or co-authored).

And then comes the second sentence of my answer: your PhD may require you to actually retrieve your data. I do not think that this is a hard rule but the spectrum of research is so wide that everything is possible.

0

Ph.D. theses, like other publications, can use data other people collected. A meta-analysis or meta-study is a statistical analysis combining the results of pre-existing published scientific studies.

1
  • While this is true, a meta-analysis is not usually an acceptable PhD thesis, at least in fields I'm familiar with Commented Jan 1, 2023 at 22:45

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .