Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

7
  • 2
    $\begingroup$ Interesting, I had never heard of testing hypotheses suggested by the data, but often of hypothesizing after the results are known. I guess the difference is subtle. $\endgroup$ Commented Jun 26 at 11:47
  • $\begingroup$ I think they're more different takes on more or less the same phenomenon, or at least very similar varieties of phenomena. $\endgroup$
    – Glen_b
    Commented Jun 26 at 13:00
  • 2
    $\begingroup$ +1 for allowing for nuance. The situation is not cut and dried, especially when the concern lies with assessing some kind of difference (usually in location of distributions) before having the opportunity to collect data in a setting where it's known that the distribution might be effectively modeled using a small finite collection of standard families rather than one such family. A common instance is in environmental sampling data, where often the distribution family is assumed to be a union of Normal, Lognormal, and Gamma families. $\endgroup$
    – whuber
    Commented Jun 26 at 14:49
  • $\begingroup$ I don't see that it's problematic in principle -- you can have some form of indicator variable joining members of such a super-family. Alternatively you can form a union of lognormal and normal (approximately) by placing them both within shifted lognormal; it's probably possible to do something to contain all three. The issue then is in the estimation of the additional parameters (such as the index in the first instance); if they're based on the same data as used in the test you would want to look at how that estimation impacts the properties of the procedure overall. ... $\endgroup$
    – Glen_b
    Commented Jun 26 at 23:12
  • $\begingroup$ ... the concern would only come if the selection was done based on the same data as was used in the subsequent test but the inference proceeded conditionally on that choice without accounting for its impact on the properties of the inference (how would you know the impact was small enough to ignore in each case?). $\,$ I'd probably be inclined to put that in a Bayesian framework where - if the inference is on a common population parameter - it's not so hard to avoid conditioning on the choice of components by multi-model inference) $\endgroup$
    – Glen_b
    Commented Jun 26 at 23:18