Skip to main content
added 19 characters in body
Source Link
Nick Cox
  • 58.6k
  • 8
  • 133
  • 199

Nicely, in some answers and comments there are already links to some stuff I have written.

I will try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here..

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions are hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent-dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice"a nice result that my client likes", rather. Rather I try to understand how exactly itthat comes about and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight seem to be in disagreement, potentially revealing interesting and unexpected aspects of the data.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

Nicely, in some answers and comments there are already links to some stuff I have written.

I try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice result that my client likes", rather I try to understand how exactly it comes and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight seem to be in disagreement, potentially revealing interesting and unexpected aspects of the data.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

Nicely, in some answers and comments there are already links to some stuff I have written.

I will try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions are hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data-dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "a nice result that my client likes". Rather I try to understand how exactly that comes about and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight seem to be in disagreement, potentially revealing interesting and unexpected aspects of the data.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

added 78 characters in body
Source Link
Christian Hennig
  • 26.1k
  • 30
  • 93

Nicely, in some answers and comments there are already links to some stuff I have written.

I try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice result that my client likes", rather I try to understand how exactly it comes and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight seem to be in disagreement, potentially revealing interesting and unexpected aspects of the data.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

Nicely, in some answers and comments there are already links to some stuff I have written.

I try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice result that my client likes", rather I try to understand how exactly it comes and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight be in disagreement.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

Nicely, in some answers and comments there are already links to some stuff I have written.

I try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice result that my client likes", rather I try to understand how exactly it comes and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight seem to be in disagreement, potentially revealing interesting and unexpected aspects of the data.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.

Source Link
Christian Hennig
  • 26.1k
  • 30
  • 93

Nicely, in some answers and comments there are already links to some stuff I have written.

I try to give a message in a nutshell here.

Yes, you are right. There is a problem. One nice way to show this is to state it as a paradox, as I have done here.

Now I also find it important to acknowledge that model assumptions are never perfectly fulfilled, and that we routinely apply methods derived from model assumptions that are in reality violated. The difficult problem is that some violations of model assumptions hardly problematic whereas some others are very problematic. In practice there is violation of model assumptions through data dependent decision making regarding which method (test) to use, as correctly stated in the question, but the tricky issue is whether this is worse than not detecting a critical violation of a model assumption by fixing a test in advance and not checking model assumptions for the data at hand. There are cases in which the bigger problem is avoided by looking at the data and taking the hands off an inappropriate test that seemed appropriate before having looked at the data.

It is correct that we should take into account as much as we can and make provisional decisions before collecting the data, but in many situations researchers just can't predict in advance enough of the nastiness that the data may later show.

It is unfortunately very situation-dependent whether data-dependent method selection does more harm than good - normally it does both, to a certain extent, and simple general rules to decide in any practical situation whether advantages outweigh disadvantages don't exist.

With enough data one can split the data and run model diagnostics on one part and use this to decide what method to run on the other part (this may of course be somewhat harder if you do cross-validation or even something like double cross-validation for model selection on top of it; also if there is dependence, for example a time series structure, suitable data splitting isn't trivial).

I end by saying that as a curious person I tend to do all kinds of things with the data, visualisations, parametric and non-parametric tests (and admittedly some things may happen conditionally on results of earlier things because I may want to understand a specific aspect better), but then I will not decide to get my message from just one test, selected based on the data and maybe even "nice result that my client likes", rather I try to understand how exactly it comes and what it means if different methods give results that don't seem in line with each other. If all methods support the same interpretation of the data, I will be pretty confident about it, and if they disagree, this will usually indicate bigger uncertainty than what a single method conveys, and this bigger uncertainty deserves reporting. Also, good visualisation may give me very detailed information why different methods such as tests may at first sight be in disagreement.

The problem with this is that it isn't formalised, there is subjectivity in it, and nobody can derive Type I or Type II "error probabilities" for the overall procedure. But then remember model assumptions don't hold in reality anyway, so any "guarantee" needs to be taken with a grain of salt.