Skip to main content
added 46 characters in body; deleted 35 characters in body
Source Link
Frans Rodenburg
  • 12.9k
  • 3
  • 31
  • 72

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

In many cases though, I would argue that it hasn'tshould not meaningfully affect the false positive rate. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.

Bayesian analyses don't solve a lack of thinking in advance though. I suppose you could ask the same question regarding empirical Bayes, and indeed, objective Bayesians oppose this approach.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in many cases, I would argue that it hasn't. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.

Bayesian analyses don't solve a lack of thinking in advance though. I suppose you could ask the same question regarding empirical Bayes, and indeed, objective Bayesians oppose this approach.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered.

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

In many cases though, I would argue that it should not meaningfully affect the false positive rate. You decide to use an overdispersed model, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.

Bayesian analyses don't solve a lack of thinking in advance though. I suppose you could ask the same question regarding empirical Bayes, and indeed, objective Bayesians oppose this approach.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

added 250 characters in body
Source Link
Frans Rodenburg
  • 12.9k
  • 3
  • 31
  • 72

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in many cases, I would argue that it hasn't. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.

Bayesian analyses don't solve a lack of thinking in advance though. I suppose you could ask the same question regarding empirical Bayes, and indeed, objective Bayesians oppose this approach.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in many cases, I would argue that it hasn't. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in many cases, I would argue that it hasn't. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.

Bayesian analyses don't solve a lack of thinking in advance though. I suppose you could ask the same question regarding empirical Bayes, and indeed, objective Bayesians oppose this approach.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.

Source Link
Frans Rodenburg
  • 12.9k
  • 3
  • 31
  • 72

I strongly agree with Stephan Kolassa, but I do think there is an exception:

Suppose you carefully consider in advance what test or model you want to use. You justify your choices of assumptions and are fairly confident that this is a reasonable approximation to the process.

Fast forward to the analysis and you find out that despite your best efforts to choose a model in advance, its assumptions simply don't hold up in the sample you have. Hence you decide on a model with more relaxed assumptions.$^\dagger$

Has the false positive rate gone up? In some cases, literature suggests it has. But you can account for this by correcting for the number of models considered. And in many cases, I would argue that it hasn't. You decide to use an overdispersed model instead, because a Poisson GLM did not provide a reasonable approximation? I don't think this is harmful and have never seen any studies demonstrate that it is (in terms of false positive rates).

In practice, correcting for the number of methods/models considered rarely happens, because preregistration of hypotheses is very uncommon outside clinical science. Unfortunately people largely still chase significance at arbitrary thresholds.

As usual, many of these problems arise due to null-hypothesis significance testing with $p$-values. Bayesian analyses suffer less from issues related to hypothesis testing in general, but depending on your field, you may not be able to convince others to use it instead of a frequentist test or model.


$^\dagger$: A non-parametric test is one such option, but there are often better alternatives that provide more interesting output than just a $p$-value.