Skip to main content
added 689 characters in body
Source Link
Sextus Empiricus
  • 82.2k
  • 5
  • 113
  • 285

In the end, what matters is what the data sampled from nature tells about the questions we have about nature.


When data supplies us at with novel questions that we want the same data to answer us, then there isn't nessarily a problem.

The data still provides information about questions about nature.

Only for people obsessed by precise statististical significance cut-off values there would be an issue, and they would need to consider whether re-using data to answer questions that came from the very same data is a problem regarding their significance computations.

Whether this is a problem just depends on how much the answer to the question suggested by the data is dependent on the suggestion by the data.


In many schemes the influence is not so large. Testing model assumptions and switching to different models is not gonna make your conclusion a lot different.

The cases where this would be a problem is when the model assumptions are increasing the power. A switch to a a non-parametric test is often not doing this. However, a parametric test does improve the power. So first performing a test about model assumptions might sound like a lucky shot. This means that there are researchers unsure about the assumptions and test them (just to see if they can use the nice parametric powerful test). Sometimes they get lucky, and as a consequence perform tests that they should not perform. The result is a higher rate of false rejections of null hypothesis.

I believe that these situations are less problematic than the regular type I errors which occur with a much larger rate (the significance level, which is often at 5%). This problem of testing assumptions is a second order influence on the occurrence of type I errors and less of a problem.

In the end, what matters is what the data sampled from nature tells about the questions we have about nature.


When data supplies us at with novel questions that we want the same data to answer us, then there isn't nessarily a problem.

The data still provides information about questions about nature.

Only for people obsessed by precise statististical significance cut-off values there would be an issue, and they would need to consider whether re-using data to answer questions that came from the very same data is a problem regarding their significance computations.

Whether this is a problem just depends on how much the answer to the question suggested by the data is dependent on the suggestion by the data.

In the end, what matters is what the data sampled from nature tells about the questions we have about nature.


When data supplies us at with novel questions that we want the same data to answer us, then there isn't nessarily a problem.

The data still provides information about questions about nature.

Only for people obsessed by precise statististical significance cut-off values there would be an issue, and they would need to consider whether re-using data to answer questions that came from the very same data is a problem regarding their significance computations.

Whether this is a problem just depends on how much the answer to the question suggested by the data is dependent on the suggestion by the data.


In many schemes the influence is not so large. Testing model assumptions and switching to different models is not gonna make your conclusion a lot different.

The cases where this would be a problem is when the model assumptions are increasing the power. A switch to a a non-parametric test is often not doing this. However, a parametric test does improve the power. So first performing a test about model assumptions might sound like a lucky shot. This means that there are researchers unsure about the assumptions and test them (just to see if they can use the nice parametric powerful test). Sometimes they get lucky, and as a consequence perform tests that they should not perform. The result is a higher rate of false rejections of null hypothesis.

I believe that these situations are less problematic than the regular type I errors which occur with a much larger rate (the significance level, which is often at 5%). This problem of testing assumptions is a second order influence on the occurrence of type I errors and less of a problem.

Source Link
Sextus Empiricus
  • 82.2k
  • 5
  • 113
  • 285

In the end, what matters is what the data sampled from nature tells about the questions we have about nature.


When data supplies us at with novel questions that we want the same data to answer us, then there isn't nessarily a problem.

The data still provides information about questions about nature.

Only for people obsessed by precise statististical significance cut-off values there would be an issue, and they would need to consider whether re-using data to answer questions that came from the very same data is a problem regarding their significance computations.

Whether this is a problem just depends on how much the answer to the question suggested by the data is dependent on the suggestion by the data.