0
$\begingroup$

I’m curious to read about the importance of normality testing in AB tests and clinical trials.

It seems that there is a lot of mixed (and strong) opinions about the necessity for normality testing, the methods in which normality would be determined (quantitative vs. visual inspection), and even what data should be normally distributed (ie the data itself, or the residuals).

Would anyone have a recommended reading / text about normality, its evaluation, and why it matters? The closer to intermediate level, the better; but any response is welcome.

$\endgroup$
2
  • 1
    $\begingroup$ Not all opinions are equal. When reading contradictory opinions, one thing that cuts down the level of noise is to consider what evidence you have that the source has expertise. e.g. "Does this person have actual statistics training? Or does their opinion/'knowledge' originally just come from other people in their own area who also don't have a statistics background?" ... it's not a guaranteed way to get the right answer, of course, but it's usually the way to bet (when my plumber has different advice from my doctor $-$ if it's on a medical issue, I'll listen to my doctor) $\endgroup$
    – Glen_b
    Commented Mar 17, 2023 at 1:08
  • 2
    $\begingroup$ The better thing is to consider the arguments that they give for their position, if you're able to evaluate them, but assuming you can't, considering level of expertise is at least something. Your question is too general without more specific details, but many questions already on site address the aspects of the issues you raise. $\endgroup$
    – Glen_b
    Commented Mar 17, 2023 at 1:09

1 Answer 1

2
$\begingroup$

The requirement for normality is greatly misunderstood. First, distributional assumptions are made about the conditional distribution, not the marginal (in your words, "the data itself"). For OLS, this is equivalent to the residuals being roughly normal. However, OLS actually makes no assumption about the likelihood (see the Gauss Markov theorem) and the estimates therefrom remain consistent and unbiased when the assumptions of the Gauss Markov theorem are satisfied (assuming the conditional mean is correctly specified). A good resource on this would be Introductory Econometrics by Wooldridge. It's an accessible book written for undergraduate level students with a minimal background in stats.

In all honesty, the normality assumption is perhaps the least important and one should pay more attention to endogeneity and potential omitted variables in my (humble) opinion. Whereas other violations can be rectified (non-linearity of the mean can be addressed with splines, heterogeneity of variance with robust covariance estimates) you can't fix something you didn't measure.

In the context of AB tests, you have to be a little more careful. Often times, the marginal distribution of, say revenue, may not have finite variance and so OLS shouldn't be applied. Even in the case where the variance is finite, the distribution may be so long tailed that the sampling distribution of the coefficients may not resemble a normal distribution with any sample collected in a reasonable amount of time.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.