17
$\begingroup$

I have recently entered the life sciences (from physics). I am concerned about the use of p values in the life sciences literature. For example, in this article, they test 9 - 12 rats in a control group and compare it to an experimental group. They use p values to claim that their results are statistically significant. This type of use of p values seems to be very common in the literature.

So here are my concerns :

  1. Why is it so often assumed that biological measurements follow a normal distribution? To my knowledge, this isn't known a priori.

  2. From my physical intuition, it seems quite challenging to claim "statistically significance" when using such low sample sizes.

$\endgroup$
4
  • $\begingroup$ Yes, if/when you can apply the central limit theorem. Things like body mass/size etc are influenced by many relatively independent atomic factors, hence you can apply the theorem. But don't fool yourself into thinking that normal distribution is all we've got in biology. Binomial, Poisson, beta and gamma distributions are highly common in genetics and bioinformatics. $\endgroup$ Commented Aug 25, 2015 at 14:31
  • $\begingroup$ 1) this is really an emprirical question; sometimes it is, sometimes it isn't. Even if the "real" process is poisson, a normal distribution can still be a fairly accurate approximation. 2) intuition based "significance" and the technical concept of "statistic significance" needs to be clearly separated. If assumptions are met, an analysis might produce statistically significant results, while still being weak with rather shaky results. $\endgroup$ Commented Aug 25, 2015 at 14:49
  • $\begingroup$ Healthy skepticism is a great thing, especially in biological research. Also remember that in Biology, models are often imperfect approximations for the questions studied. That doesn't mean they aren't the best that we can do given the current limits of technology, but it is something to always have in the back of your mind when you read biological publications. There are countless examples or studies that produced significant results in rodents but collapsed in human trial. Also, in vitro tissue cultures can be fraught with assumptions that cannot be extrapolated to the organism. $\endgroup$
    – AMR
    Commented Aug 26, 2015 at 17:21
  • $\begingroup$ Answers to this question are right, but I miss someone mentioning that there are tests to assess if your data follow a normal distribution, although those test aren't powerful if your sample is small. $\endgroup$
    – Pere
    Commented Sep 18, 2016 at 21:09

3 Answers 3

12
$\begingroup$

kmm's answer is correct; I just want to add some of my points on what kind of data should follow Gaussian distribution.


Unless you know from observation that a process doesn't follow a Gaussian distribution (e.g., Poisson, binomial, etc.), then it probably does at least well enough for statistical purposes.

I won't fault kmm for this statement because what they said is what happens prevalently. This is practically what all biologists do, but this is an incorrect approach.

Gaussian should not be considered a default distribution. This may lead to incorrect inferences. Usually the experimenter has an idea of what kind of data they are measuring and what distribution is the data likely to follow. If you are unsure of the underlying distribution then you should go for non-parametric statistical tests.


What kind of data follow Gaussian distribution?

According to the Central Limit Theorem, the distribution of the mean (expected value) or sum of several samples of independent and identically distributed (IID) random variables would follow Gaussian distribution. The random variable itself can follow any distribution but if you measure the mean several times using repeated experimentation, then the distribution of the mean would be Gaussian.

From the Wolfram site:

Let $X_1,X_2,...,X_N$ be a set of N independent random variates and each $X_i$ have an arbitrary probability distribution $P(x_1,...,x_N)$ with mean $\mu_i$ and a finite variance $\sigma_i^2$. Then the normal form variate:

$$X_{norm}=\frac{\displaystyle\sum_{i=1}^N x_i-\sum_{i=1}^N \mu_i}{\sqrt{\displaystyle\sum_{i=1}^N \sigma_i^2}}$$

has a limiting cumulative distribution function which approaches a normal distribution.

The wikipedia page on CLT is also quite good. You can have a look at it too.

Usually in biological experiments we measure some property, lets say expression of some gene. When you do several replicates, and there is no specific underlying mechanism that would generate variation (i.e. the errors are purely random), then you would get normally distributed values. Note that this applies only for the sample means. In certain cases, we assume that the variation in the value of a variable is because of some random fluctuation and therefore consider these variables to be normally distributed (not their means but the values themselves); for e.g. the weights of mice which are fed and raised equally. This is just your assumption which constitutes the null hypothesis.

Another point to note is that the variable that is expected to follow normal distribution should essentially be continuous in nature. Some discrete variables can be approximated as continuous but one should have a good reasoning of doing so. For example population sizes, though discrete, can be assumed continuous if the sizes are large.


Poisson distribution is unique and is a discrete distribution. Certain kinds of phenomena result in Poisson distributed RVs. These phenomena should basically be Poisson processes. See this post for details. Poisson distribution basically models the probability of N events in a given time interval for some given rate of events ($\lambda$). This rate is also called the intensity of the distribution.


Binomial is another unique discrete distribution. Genotypes resulting because of Mendelian segregation of genes, for example, follow this distribution. It basically models the probability of N number of events in some M trials. In binomial distribution there are only two possible outcomes. Multinomial distribution is a generalization of binomial distribution with multiple outcomes.


Since both Poisson and Binomial are discrete distributions they should not be confused with normal distribution. However, under certain conditions especially when the number of trials in Binomial distribution is high and binomial probability = 0.5 then it can be approximated as a gaussian with same value of moments. Similarly, if the intensity (rate) of Poisson distribution is high or the time interval is large, the distribution of the Poisson RV can be approximated to Gaussian (with same value of moments). In these cases, the value of the mean goes up significantly, thereby allowing a continuous approximation.

Many datasets show power-law like/skewed normal distributions and people often make the mistake of assuming them to be normal. An example (from my experience) is the expression of all the genes in a cell. Very few genes have high expression and many genes have low expression. This also applies for degree distribution of nodes in some real networks such as gene regulatory network.


In summary you should assume Gaussian distribution when:

  • Variable is a measurement of a value which is repeated several times from identical samples
  • Variability is expected to be random in the control case (in t-test, when you reject null hypothesis you are actually saying that a certain variable does not follow the normal distribution assumed under the null hypothesis)
  • Variable is continuous, or discrete with large sample size
$\endgroup$
10
$\begingroup$

You raise two issues, both of which might be better suited for stats.SE, but I think the questions are suitably biological to warrant an answer here.

Do most biological processes follow a Gaussian distribution?

Unless you know from observation that a process doesn't follow a Gaussian distribution (e.g., Poisson, binomial, etc.), then it probably does at least well enough for statistical purposes. While ~10 observations isn't enough to test the distribution accurately (and those tests are pretty flawed anyway), as long as the values are approximately normally distributed, then you probably meet the assumptions of most general linear model-type statistical test (t-test, ANOVA, linear regression). These tests are fairly robust to deviations from normality, so in a sense, as long as the values are close enough to normal, the test is fine (which says nothing about the interpretation of the results).

Is the pervasive use of p values warranted? Is there problem with small sample sizes?

Although certainly not the first to raise the alarm about p values, Ioannidis's (2005) paper sounded most loudly. The central idea is that, in science, there is a strong tendency to only publish "significant" (but whatever definition of significance you use) results. Thus the literature is rife with false significant results. For example, if only 1 in 20 experiments yield significant results, the other 19 are not likely to be published. Yet those 5% might represent 95% of the literature, and thus we have a strong bias in the literature. All of those "significant" results can't possibly be correct.

Statistical inference from small sample sizes are also quite problematic (e.g., in neuroscience; Button et al., 2013). There has been a recent push toward including effect sizes for estimated parameters and simply reporting confidence intervals (which will be suitably wide for small sample sizes).

Many of the failings of statistics are summarized in Statistics Done Wrong: The woefully complete guide, which I don't have any affiliation with except that I enjoyed reading it.

The paper you link is pretty deficient in what you might call modern statistical analysis. What they could improve:

  • Run (and show the results of) an a priori power analysis to establish that their sample sizes are adequate
  • Include effect sizes of the estimated parameters
  • Include confidence intervals for estimated parameters
  • Use one of the many available multiple comparisons procedures to control familywise type I error rates.

It's incumbent on reviewers of papers to ask for these things if the authors do no supply them willingly.

There is no way around the small sample size in many studies, so it's possible that there isn't much they could do to change the sample size. Those who work with humans or animals are under pressure to keep sample sizes as small as possible while maintaining adequate power. However, they could show that what they have is sufficiently powerful.

Button, K. S., J. P. A. Ioannidis, C. Mokrysz, B. A. Nosek, J. Flint, E. S. J. Robinson, and M. R. Munafò. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14:1–12.

Ioannidis, J. P. A. 2005. Why most published research findings are false. PLoS Medicine 2:e124.

$\endgroup$
1
  • 1
    $\begingroup$ I really don't agree with the first part of your answer, "Unless you know from observation that a process doesn't follow a Gaussian distribution ... then it probably does". The burden of proof should be the other way around: unless you can clearly motivate why your data should be normal distributed, you should concede that it might not be, and use a nonparametric method. $\endgroup$
    – Roland
    Commented Oct 21, 2016 at 11:49
4
$\begingroup$

You are right to be suspicious. I would contend that, in most situations, hypothesis tests based on the normal distribution are not appropriate. If hypothesis testing is needed, a permutation test should almost always be used.

As WYSIWYG points out, there is no reason to assume a measurement is normal distributed without strong a priori knowledge. The central limit theorem is the standard argument for assuming that the mean is approximately normal distributed, but I would say that it is not very helpful in practise, because the convergence can be very slow: if your data distribution is far from normal, you need a large number of samples for the mean to be approximately normal. How many? Impossible to tell, since we don't know the data distribution! So in practise, the approximation can be very bad, and then the test will be completely off. This applies not only for the t-test, but for many parametric tests that rely on the normal approximation, such as chi-square tests.

Fortunately, there are better tools nowadays. The permutation test does not require the normal distribution assumption; the results are always valid, regardless of the data distribution. It is easy to carry out with today's computers, and simple to understand. This is a good book on permutation tests (and other resampling methods).

Ronald Fischer and his contemporaries that developed the normal theory in the early 1900's were perfectly aware that permutation test was a much better solution, but it requires extensive calculations, which was simply not possible back then. So the normal test was developed as a poor man's approximation to the exact permutation test. But today, we have no need for this approximation anymore, as our powerful computers can perform even large permutation tests in the blink of an eye.

Then why do people stick to the approximate normal tests still? Sadly, I think they are routinely used only because most biologists know of no other tools, and simply follow tradition. A historical reason behind the normal assumption in biology is a classic argument by Fischer, concerning population genetics: if a phenotype is affected by a large number of genes, and their effects are additive, then the phenotype variable is a sum of many random variables, and by the central limit theorem it should be approximately normal. The classic example is height, which is indeed closely normal distributed in the population. But this reasoning applies to genetics of natural populations, not laboratory experiments.

Then there is the whole debate about whether hypothesis testing (and p-values in particular) should be used at all, regardless of which test you use. Other answers touched upon this. I won't go into it, but it is a very important topic, and I would recommend this excellent article from Nature, and references therein.

http://www.nature.com/news/scientific-method-statistical-errors-1.14700f

$\endgroup$
3
  • $\begingroup$ The willingness to publish only significant results has a second consequence to add to bias on publications. Since power of parametric tests is usually larger than power of equivalent non parametric tests, often results are significant only if parametric tests are used - usually needing to assume a Gaussian distribution. Therefore, researchers willing to publish are highly incentived to assume normality. $\endgroup$
    – Pere
    Commented Sep 18, 2016 at 21:14
  • $\begingroup$ Can you please justify shortly the benefit of permutation tests? What do you want to answer with them? How can they help to identify the possible distributions in biological data? $\endgroup$ Commented Oct 17, 2016 at 11:35
  • $\begingroup$ Permutations tests are commonly used to test difference in mean / median / etc between two sample groups. They require only very mild assumptions (e.g. independent sample) and are applicable to virtually all data distributions. Estimating the actual underlying distribution is not necessary (and rarely feasible). $\endgroup$
    – Roland
    Commented Oct 18, 2016 at 22:09

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .