57
$\begingroup$

I noticed on Math Stack Exchange a terrific thread which highlighted a number of very visually interesting math concepts. I would be curious to see graphics/gifs which anyone has that very clearly illustrate a statistics concept (particularly those that might serve as motivation for students just starting to learn statistics).

I am thinking of things along the lines of how videos of a Galton board make the CLT instantly relatable.

$\endgroup$
0

13 Answers 13

42
$\begingroup$

I like images illustrating how different patterns can have similar correlation. The ones below are from Wikipedia articles on correlation and dependence

enter image description here

and Anscombe's quartet with correlations of about $0.816$

enter image description here

$\endgroup$
5
  • 3
    $\begingroup$ Excellent comment! I have seen Anscombe's quartet before and think it may be one of the best "beware of correlation" graphics I have ever seen. $\endgroup$ Commented Mar 2, 2020 at 13:51
  • $\begingroup$ @David Then see these posts for more on the subject. $\endgroup$
    – whuber
    Commented Mar 2, 2020 at 16:51
  • $\begingroup$ At the risk of being too obvious I would add that "think about what that means on a scatter plot" can illuminate many questions, and not just for learners. In several fields (no names here) there is a tendency to dismiss what was taught in an introductory course as baby stuff and/or to start teaching assuming that all the students did and also remember an introductory course. $\endgroup$
    – Nick Cox
    Commented Mar 3, 2020 at 12:36
  • $\begingroup$ @Alexis. Which one? The December 2011 Science article is paywalled so I cannot see what you mean but both the images are older than that. Anscombe's numbers have been around since he published them in 1973 $\endgroup$
    – Henry
    Commented Mar 4, 2020 at 23:41
  • $\begingroup$ Oh! Not the Anscomb, the non-functional association images. Also: I just re-read the Reshef article, and there are similar images, but they are actually different. So I was mis-remembering/mis-attributing. Apologies for any alarm. :) $\endgroup$
    – Alexis
    Commented Mar 5, 2020 at 1:47
35
$\begingroup$

Simpson's Paradox

A phenomenon that appears when a key variable is omitted from the analysis of a relationship between one or more independent variables and a dependent variable. For instance, this shows the more bedrooms houses have, the lower the home price:

Average Home Price vs. Avg Number of Bedrooms
(source: ba762researchmethods at sites.google.com)

which seems counter-intuitive, and is easily resolved by plotting all the data points that make up the average for each area, on the same graph. Here, the greater number of bedrooms correctly indicate pricier homes when also observing the neighborhood variable:

Home Price vs. Number of Bedrooms
(source: ba762researchmethods at sites.google.com)

If you'd like to read more about the above example and get a far better explanation than I was able to provide, click here.

$\endgroup$
4
  • 8
    $\begingroup$ Note that you don't have to look at averages for Simpson's paradox to occur--just don't tell the model there are four groups. In addition, although it might be nitpicking, the fit in the lower plot is not very convincing, since it looks like it assumes all slopes to be equal, which you can clearly tell isn't the case. $\endgroup$ Commented Mar 3, 2020 at 5:39
  • 2
    $\begingroup$ As elsewhere in this this thread, mixing red and green is problematic for many readers. For anyone challenged by this colour choice, the graph shows four slightly overlapping clusters each summarized by upward sloping lines, whereas the whole dataset shows a negative relation. $\endgroup$
    – Nick Cox
    Commented Mar 3, 2020 at 12:04
  • 2
    $\begingroup$ Agreed Frans, taking averages is an over-simplification, as is the slope on the bottom graph. In fact, I believe both graphs are purely fictitious representations of the concept. They came from the last link in my answer, which was linked to from a different article I was reading that illustrated Simpson's paradox in an econometric setting: Tax Burdens, Per Capita Income, and Simpson’s Paradox $\endgroup$
    – TH58PZ700U
    Commented Mar 3, 2020 at 21:41
  • 1
    $\begingroup$ That's either very few or very many bedrooms! :) $\endgroup$
    – smcs
    Commented Mar 4, 2020 at 15:39
29
$\begingroup$

One of the most interesting concepts that are very important today and very easy to visualize is "overfitting". The green classifier below presents a clear example of overfitting [Edit: "the green classifier is given by the very wiggly line separating red and blue data points" - Nick Cox].

From Wikipedia:

enter image description here

$\endgroup$
16
  • 14
    $\begingroup$ For those who have difficulty distinguishing red and green: the green classifier is given by the very wiggly line separating red and blue data points. $\endgroup$
    – Nick Cox
    Commented Mar 2, 2020 at 7:07
  • 1
    $\begingroup$ @NickCox The image is perfectly understandable even in black and white. $\endgroup$
    – user76284
    Commented Mar 4, 2020 at 17:15
  • 4
    $\begingroup$ @user76284 Sure, if and only if you are told, or you take it on trust, that the wiggly line is a perfect classifier and the smooth line is not. The point is that OP chose red and green when there's a politer and more inclusive way to use colours. Gee, this thread is supposed to be about "very clear" examples but deficient examples qualify? I would be happy with e.g. circles and pluses in black and white, but that is not what is on offer. $\endgroup$
    – Nick Cox
    Commented Mar 4, 2020 at 17:58
  • 3
    $\begingroup$ For all those reflexively assuming that this won't work for people with colour blindness, perhaps try examining it in a colour blindness simulator first, such as at color-blindness.com/coblis-color-blindness-simulator That shows that this image actually works pretty well under most forms of colour blindness - there are other dimensions of colour perception beyond hue which allow the colours in this image to be distinguished easily, even when the subjective appearance differs substantially. $\endgroup$ Commented Mar 5, 2020 at 3:52
  • 1
    $\begingroup$ @MichaelMacAskill You're right, and for example I am not assuming that it "won't work" for any group of people. I am just saying that the design could be improved in small but helpful and inclusive ways. The point is about graphical etiquette as much as anything else. I don't mind people regarding this as a small point, as it is, but I think it's still worth making. Your saying that it works "pretty well" I take to mean that your view is close to mine; you didn't say "excellently". $\endgroup$
    – Nick Cox
    Commented Mar 5, 2020 at 15:49
25
$\begingroup$

How does a 2D dataset where the mean of X is 54 with a SD 17, and for Y 48 and 27, respectively, and the correlation between the two is -0.06?

Introducing the Anscombosaurus:

enter image description here

And its companion, the Datasaurus Dozen:

enter image description here

$\endgroup$
1
  • 1
    $\begingroup$ To get students interested, these are fantastic examples! $\endgroup$ Commented Mar 4, 2020 at 18:46
20
$\begingroup$

I think spurious correlations also deserve their own post. I.e. correlation does not equal causation. Perhaps one of the things used most often when trying to bend the truth using statistics. Tyler Vigen has a famous website with lots of examples. To illustrate - see the plot below where the number of polio cases and the ice cream sales are clearly correlated. But to assume that polio causes ice cream sales or the other way around is clearly nonsensical. Polio causes ice cream

P.S: Relevant xkcd 1 and relevant xkcd 2

$\endgroup$
1
  • 4
    $\begingroup$ For everyone actually doing this, I would advise to be very careful not to mix "correlation does not equal causation" and "sample correlation does not equal correlation". The polio vs ice cream graph is good, but many examples claimed to be demonstrations of "correlation does not equal causation" are actually just artefacts of small sample size and do not even demonstrate real correlation. $\endgroup$
    – JiK
    Commented Mar 4, 2020 at 12:49
18
$\begingroup$

Bias can be good

An $\color{orangered}{\text{unbiased estimator}}$ is on average correct. A $\color{steelblue}{\text{biased estimator}}$ is on average not correct.

Why then, would you ever want to use a biased estimator (e.g. ridge regression)?

biased_estimator

The answer is that introducing bias can reduce variance.

In the picture, for a given sample, the $\color{orangered}{\text{unbiased estimator}}$, has a $68\%$ chance to be within $1$ arbitrary unit of the true parameter, while the $\color{steelblue}{\text{biased estimator}}$ has a much larger $84\%$ chance.

If the bias you have introduced reduces the variance of the estimator sufficiently, your one sample has a better chance of yielding an estimate close to the population parameter.

"On average correct" sounds great, but does not give any guarantees of how far individual estimates can deviate from the population parameter. If you would draw many samples, the $\color{steelblue}{\text{biased estimator}}$ would on average be wrong by $0.5$ arbitrary units. However, we rarely have many samples from the same population to observe this 'average estimate', so we would rather have a good chance of being close to the true parameter.

$\endgroup$
17
$\begingroup$

When first understanding estimators and their error, it's useful to understand two sources of error: bias and variance. The below image does a great job illustrating this while highlighting tradeoffs between these two sources of error.

enter image description here

The bullseye is the true value the estimator is trying to estimate and each dot represents and estimate of that value. Ideally you have low bias and low variance, but the other dart boards represent less than ideal estimators.

$\endgroup$
2
  • 2
    $\begingroup$ This is a great classic example, but it would be nice to also perhaps add the term "precision" as an equivalent (if inverse) term to variance here, as this is also how this is often communicated. i.e. low variance = precise, high variance = imprecise. I guess variance might be more relevant to the data itself, whereas precision is more relevant to estimates based upon the data, while bias is a term that works for both. $\endgroup$ Commented Mar 5, 2020 at 4:01
  • 2
    $\begingroup$ I've seen this often phrased in terms of accuracy and precision. Many social or behavioural scientists might want to talk about validity and reliability. $\endgroup$
    – Nick Cox
    Commented Mar 5, 2020 at 15:53
14
$\begingroup$

Principal component Analysis (PCA) PCA is a method for dimension reduction. It projects the original variables in the direction that maximizes the variance.

In our figure, the red points come from a bivariate normal distribution. The vectors are the eigenvectors and the size of these vectors are proportional to the values of the respective eigenvalues. Principal component analysis provides new directions that are orthogonal and point to the directions of high variance.

enter image description here

$\endgroup$
1
  • 11
    $\begingroup$ This could do with a less technical / more layman's explanation. I've taken a few statistics courses and I've done PCA and I still can't understand much of the explanation. What are the eigenvectors/values? I know what variance is, but what does it mean for a direction to have high variance? And why do we care about that? $\endgroup$
    – NotThatGuy
    Commented Mar 2, 2020 at 16:39
11
$\begingroup$

Eigenvectors & Eigenvalues

The concept of eigenvectors and eigenvalues which are the basis for principal component analysis (PCA), as explained on wikipedia:

In essence, an eigenvector $v$ of a linear transformation $T$ is a nonzero vector that, when $T$ is applied to it, does not change direction. Applying $T$ to the eigenvector only scales the eigenvector by the scalar value $\lambda$, called an eigenvalue. This condition can be written as the equation: $T(v) = \lambda v$.

The above statement is very elegantly explained using this gif:

enter image description here

Vectors denoted in blue $\begin{bmatrix}1 \\1 \\ \end{bmatrix}$ and magenta $\begin{bmatrix}1 \\-1 \\ \end{bmatrix}$ are eigenvectors for the linear transformation, $T = \begin{bmatrix}2 & 1 \\1 & 2 \\ \end{bmatrix}$. The points that lie on the line through the origin, parallel to the eigenvectors, remain on the line after the transformation. The vectors in red are not eigenvectors, therefore their direction is altered by the transformation. Blue vectors are scaled by a factor of 3 -- which is the eigenvalue for the blue eigenvector, whereas the magenta vectors are not scaled, since their eigenvalue is 1.


Link to Wikipedia article.

$\endgroup$
9
$\begingroup$

Trade-off bias variance is another very important concept in Statistics/Machine Learning.

The data points in blue come from $y(x)=\sin(x)+\epsilon$, where $\epsilon$ has a normal distribution. The red curves are estimated using different samples. The figure "Large Variance and Small Bias" presents the original model, which is Radial basis function network with 24 gaussian bases.

The figure "Small Variance and Large Bias" presents the same model regularized.

Note that in the figure "Small Variance and Large Bias" the red curves are very close to each other (small variance). The same does not happen in the figure "Large Variance and Small Bias" (large variance).

Small Variance and Large Bias enter image description here

Large Variance and Small Bias enter image description here

From my computer methods and machine learning course.

$\endgroup$
2
  • 14
    $\begingroup$ Because these plots are unlabeled and unexplained, they are so inherently ambiguous that they show nothing at all. Maybe you could elaborate on your answer? $\endgroup$
    – whuber
    Commented Mar 2, 2020 at 16:52
  • 3
    $\begingroup$ Another common visualization of the bias-variance trade off $\operatorname{MSE}=\operatorname{Bias}^2+\operatorname{Var}+\sigma^2$ that is usually encountered is like this $\endgroup$
    – Dabed
    Commented Mar 3, 2020 at 16:38
9
$\begingroup$

Here is very basic one, but in my opinion very powerful because it's not only a visual explanation of a concept but also asks for visualising or imagining a real object depicting the concept:

Neophytes sometimes have a hard time understanding very basic concepts like mean, median and mode.

enter image description here

So, for helping them to better grasp the idea of mean:

Take this skewed distribution and do a 3D print of it, in plastic, or carve it in wood, so now you have a real object in your hands. Try to balance it using just one finger... the mean is the only point where you can do that.

enter image description here

$\endgroup$
4
  • 6
    $\begingroup$ I like the principle. In the example given I don't think the position of the finger would work: the right tail is not long enough. Also, although many. many complications are possible it's not likely that mode, median, and mode are equally spaced for many distributions, even approximately. $\endgroup$
    – Nick Cox
    Commented Mar 3, 2020 at 12:30
  • $\begingroup$ @NickCox That's just an image I got from wikipedia (en.wikipedia.org/wiki/Skewness), and despite being quite bad it's way better than most other images of a skewed distribution (online + copyright free). The important thing here is the principle, which remains true. $\endgroup$ Commented Mar 4, 2020 at 2:55
  • $\begingroup$ The question asks for "graphics/gifs which anyone has that very clearly illustrate a statistics concept" and I don't buy "very clearly" in this case. $\endgroup$
    – Nick Cox
    Commented Mar 4, 2020 at 7:34
  • $\begingroup$ Your favourite software should let you draw e.g. an exponential with mean 1, median $\ln 2$ and mode 0, which would be one of many much better examples. $\endgroup$
    – Nick Cox
    Commented Mar 4, 2020 at 9:39
4
$\begingroup$

The figure below shows the importance of defining preciselly the objectives and assumptions of a clustering problem (and a general statistical problem). Different models may provide very different results:

enter image description here

Sources: ScikitLearn

$\endgroup$
1
  • 2
    $\begingroup$ I think this may be a bit clearer if you just chose two clustering methods. Are there 2 with a good statistical interpretation which we could narrow it down to? $\endgroup$ Commented Mar 5, 2020 at 22:01
1
$\begingroup$

Okay, so this one is less about illustrating a basic concept, but it is very interesting both visually and in terms of applications. I think showing people what they can ultimately accomplish with what they are learning is a great form of motivation, so you can pitch it as an example of developing and applying statistical models, which depends on all the more fundamental statistical concepts they are learning. With that, I present to you...

Species Distribution Modelling

It's actually a very broad topic with a lot of nuance in terms of types of data, data collection, model setup, assumptions, applications, interpretations, etc. But very simply put, you take sample information about where a species occurs, then use those locations to sample potentially relevant environmental variables (e.g., climate data, soil data, habitat data, elevation, light pollution, noise pollution, etc), develop a model using the data (e.g., GLM, point process model, etc), then use that model to predict across a landscape using your environmental variables. Depending on how the model was setup, what's predicted might be potential suitable habitat, likely areas of occurrence, species distribution, etc. You can also change the environmental variables to see how they impact these results. People have used SDMs to find previously unknown populations of a species, they've used them to discover new species, with historical climate data they've used them to predict backwards in time where a species used to occur and how it got to where it is today (even all the way back through glaciation periods), and with things like future climate predictions and habitat loss, they are used to predict how human activities will affect the species in the future. These are just a few examples, and if I have time later I'll find and link interesting papers. In the meantime here's a quick image I found illustrating the basics:

Source: https://www.natureserve.org/conservation-tools/species-distribution-modeling

$\endgroup$
4
  • 2
    $\begingroup$ I can't see that this answers the question at all. $\endgroup$
    – Nick Cox
    Commented Mar 3, 2020 at 12:05
  • $\begingroup$ @NickCox The OP asked for an image showing a statistics concept (no mention of actual topic), and preferred something that would be motivating. Is building a model not a statistics concept? Perhaps not a basic one, like a t-test or the central limit theorem, but I would certainly consider it one. And as a more big picture concept, it might be more motivating for students just starting statistics by showing them what they will eventually be able to accomplish. I'm literally saying they can use statistical models to discover new species; try making a t-test that interesting for students. $\endgroup$
    – anjama
    Commented Mar 3, 2020 at 13:00
  • 5
    $\begingroup$ The question is asking for "graphics/gifs which anyone has that very clearly illustrate a statistics concept". Your graphic doesn't do anything for me but illustrate that data on species occurrence and environmental predictors allow predictions of suitability, which is fine by me (I've done analyses of this kind myself). The graphic is pleasant but no more, so sorry, but you've not shifted my view (or as yet got any upvotes either). $\endgroup$
    – Nick Cox
    Commented Mar 3, 2020 at 13:07
  • 1
    $\begingroup$ I don't expect my example to by the picked answer by any means, and I personally do think the other answers are interesting, and certainly address what the OP was probably more expecting. With that said, OP was also asking for motivational stuff as well. Having given college freshman their first intro to stats in the past, I know how hard is to get them engaged in it, and I'm hoping that people seeing my answer will encourage people to be more creative about providing big picture ideas and applications that help make the statistics more engaging to students. $\endgroup$
    – anjama
    Commented Mar 3, 2020 at 13:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.