286
$\begingroup$

...assuming that I'm able to augment their knowledge about variance in an intuitive fashion ( Understanding "variance" intuitively ) or by saying: It's the average distance of the data values from the 'mean' - and since variance is in square units, we take the square root to keep the units same and that is called standard deviation.

Let's assume this much is articulated and (hopefully) understood by the 'receiver'. Now what is covariance and how would one explain it in simple English without the use of any mathematical terms/formulae? (I.e., intuitive explanation. ;)

Please note: I do know the formulae and the math behind the concept. I want to be able to 'explain' the same in an easy to understand fashion, without including the math; i.e., what does 'covariance' even mean?

$\endgroup$
4
  • 2
    $\begingroup$ @Xi'an - 'how' exactly would you define it via simple linear regression? I'd really like to know... $\endgroup$
    – PhD
    Commented Nov 8, 2011 at 2:08
  • 4
    $\begingroup$ Assuming you already have a scatterplot of your two variables, x vs. y, with origin at (0,0), simply draw two lines at x=mean(x) (vertical) and y=mean(x) (horizontal): using this new system of coordinates (origin is at (mean(x),mean(y)), put a "+" sign in the top-right and bottom-left quadrants, a "-" sign in the two other quadrants; you got the sign of the covariance, which is basically what @Peter said. Scaling the x- and y-units (by SD) lead to a more interpretable summary, as discussed in the ensuing thread. $\endgroup$
    – chl
    Commented Nov 9, 2011 at 22:54
  • 2
    $\begingroup$ @chl - could you please post that as an answer and maybe use graphics to depict it! $\endgroup$
    – PhD
    Commented Nov 10, 2011 at 4:03
  • 1
    $\begingroup$ I found the video on this website to help me as I prefer images over abstract explanations. Website with video Specifically this image: ![enter image description here](i.sstatic.net/xGZFv.png) $\endgroup$ Commented Jul 29, 2015 at 21:13

11 Answers 11

517
$\begingroup$

Sometimes we can "augment knowledge" with an unusual or different approach. I would like this reply to be accessible to kindergartners and also have some fun, so everybody get out your crayons!

Given paired $(x,y)$ data, draw their scatterplot. (The younger students may need a teacher to produce this for them. :-) Each pair of points $(x_i,y_i)$, $(x_j,y_j)$ in that plot determines a rectangle: it's the smallest rectangle, whose sides are parallel to the axes, containing those points. Thus the points are either at the upper right and lower left corners (a "positive" relationship) or they are at the upper left and lower right corners (a "negative" relationship).

Draw all possible such rectangles. Color them transparently, making the positive rectangles red (say) and the negative rectangles "anti-red" (blue). In this fashion, wherever rectangles overlap, their colors are either enhanced when they are the same (blue and blue or red and red) or cancel out when they are different.

Positive and negative rectangles

(In this illustration of a positive (red) and negative (blue) rectangle, the overlap ought to be white; unfortunately, this software does not have a true "anti-red" color. The overlap is gray, so it will darken the plot, but on the whole the net amount of red is correct.)

Now we're ready for the explanation of covariance.

The covariance is the net amount of red in the plot (treating blue as negative values).

Here are some examples with 32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest).

Covariance plots, updated 2019

They are drawn on common axes to make them comparable. The rectangles are lightly outlined to help you see them. This is an updated (2019) version of the original: it uses software that properly cancels the red and cyan colors in overlapping rectangles.

Let's deduce some properties of covariance. Understanding of these properties will be accessible to anyone who has actually drawn a few of the rectangles. :-)

  • Bilinearity. Because the amount of red depends on the size of the plot, covariance is directly proportional to the scale on the x-axis and to the scale on the y-axis.

  • Correlation. Covariance increases as the points approximate an upward sloping line and decreases as the points approximate a downward sloping line. This is because in the former case most of the rectangles are positive and in the latter case, most are negative.

  • Relationship to linear associations. Because non-linear associations can create mixtures of positive and negative rectangles, they lead to unpredictable (and not very useful) covariances. Linear associations can be fully interpreted by means of the preceding two characterizations.

  • Sensitivity to outliers. A geometric outlier (one point standing away from the mass) will create many large rectangles in association with all the other points. It alone can create a net positive or negative amount of red in the overall picture.

Incidentally, this definition of covariance differs from the usual one only by a constant of proportionality. The mathematically inclined will have no trouble performing the algebraic demonstration that the formula given here is always twice the usual covariance. For a full explanation, see the follow-up thread at https://stats.stackexchange.com/a/222091/919.

$\endgroup$
37
  • 40
    $\begingroup$ Now if only all introductory statistical concepts could be presented to students in this lucid manner … $\endgroup$
    – MannyG
    Commented Nov 10, 2011 at 18:26
  • 8
    $\begingroup$ This is beautiful. And very very clear. $\endgroup$
    – mako
    Commented Jun 2, 2012 at 15:37
  • 7
    $\begingroup$ Having done the algebra, I wonder if "universal constant of proportionality (independent of the data set size)" may be misleading, so I want to check if I understood the procedure correctly. For {(0,0),(1,1),(2,2)} there are $3\choose{2}$ = 3 possible rectangles of areas 1, 1 and 4. They're all red so the "covariance" is 6. And {(0,0),(1,1),(1,1),(2,2)} has $4\choose{2}$ = 6 rectangles, all red or zero, of areas 0, 1, 1, 1, 1 and 4 so "covariance" is 8. Is this right? If so it's $\sum_{i<j}(x_i-x_j)(y_i-y_j)$. $\endgroup$
    – Silverfish
    Commented Nov 7, 2013 at 10:42
  • 9
    $\begingroup$ Thanks, this as I suspected. I realised that an extra factor of 2 comes out if the sum is taken over all $i, j$ rather than $i<j$. I think the only other ambiguity is whether the "pair" $(i,i)$ counts - the area of the rectangle is zero, but if averaging rather than summating it clearly makes a difference! Incidentally when I teach covariance I also use the "positive and negative rectangles" approach, but pairing each data points with the mean point. I find this makes some of the standard formulae more accessible, but on the whole I prefer your method. $\endgroup$
    – Silverfish
    Commented Nov 8, 2013 at 17:00
  • 11
    $\begingroup$ @fcoppens Indeed, there is a traditional explanation that proceeds as you suggest. I thought of this one because I did not want to introduce an idea that is unnecessary--namely, constructing the centroid $(\bar x, \bar y)$. That would make the explanation inaccessible to the five-year-old with a box of crayons. Some of the conclusions I drew at the end would not be immediate, either. For example, it would no longer be quite so obvious that the covariance is sensitive to certain kinds of outliers. $\endgroup$
    – whuber
    Commented Aug 17, 2015 at 14:33
92
$\begingroup$

To elaborate on my comment, I used to teach the covariance as a measure of the (average) co-variation between two variables, say $x$ and $y$.

It is useful to recall the basic formula (simple to explain, no need to talk about mathematical expectancies for an introductory course):

$$ \text{cov}(x,y)=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y) $$

so that we clearly see that each observation, $(x_i,y_i)$, might contribute positively or negatively to the covariance, depending on the product of their deviation from the mean of the two variables, $\bar x$ and $\bar y$. Note that I do not speak of magnitude here, but simply of the sign of the contribution of the ith observation.

This is what I've depicted in the following diagrams. Artificial data were generated using a linear model (left, $y = 1.2x + \varepsilon$; right, $y = 0.1x + \varepsilon$, where $\varepsilon$ were drawn from a gaussian distribution with zero mean and $\text{SD}=2$, and $x$ from an uniform distribution on the interval $[0,20]$).

enter image description here

The vertical and horizontal bars represent the mean of $x$ and $y$, respectively. That mean that instead of "looking at individual observations" from the origin $(0,0)$, we can do it from $(\bar x, \bar y)$. This just amounts to a translation on the x- and y-axis. In this new coordinate system, every observation that is located in the upper-right or lower-left quadrant contributes positively to the covariance, whereas observations located in the two other quadrants contribute negatively to it. In the first case (left), the covariance equals 30.11 and the distribution in the four quadrants is given below:

   +  -
+ 30  2
-  0 28

Clearly, when the $x_i$'s are above their mean, so do the corresponding $y_i$'s (wrt. $\bar y$). Eye-balling the shape of the 2D cloud of points, when $x$ values increase $y$ values tend to increase too. (But remember we could also use the fact that there is a clear relationship between the covariance and the slope of the regression line, i.e. $b=\text{Cov}(x,y)/\text{Var}(x)$.)

In the second case (right, same $x_i$), the covariance equals 3.54 and the distribution across quadrants is more "homogeneous" as shown below:

   +  -
+ 18 14
- 12 16

In other words, there is an increased number of case where the $x_i$'s and $y_i$'s do not covary in the same direction wrt. their means.

Note that we could reduce the covariance by scaling either $x$ or $y$. In the left panel, the covariance of $(x/10,y)$ (or $(x,y/10)$) is reduced by a ten fold amount (3.01). Since the units of measurement and the spread of $x$ and $y$ (relative to their means) make it difficult to interpret the value of the covariance in absolute terms, we generally scale both variables by their standard deviations and get the correlation coefficient. This means that in addition to re-centering our $(x,y)$ scatterplot to $(\bar x, \bar y)$ we also scale the x- and y-unit in terms of standard deviation, which leads to a more interpretable measure of the linear covariation between $x$ and $y$.

$\endgroup$
51
$\begingroup$

I loved @whuber 's answer - before I only had a vague idea in my mind of how covariance could be visualised, but those rectangle plots are genius.

However since the formula for covariance involves the mean, and the OP's original question did state that the 'receiver' does understand the concept of the mean, I thought I would have a crack at adapting @whuber's rectangle plots to compare each data point to the means of x and y, as this more represents what's going on in the covariance formula. I thought it actually ended up looking fairly intuitive: "Covariance graphs for a variables with different correlations"

The blue dot in the middle of each plot is the mean of x (x_mean) and mean of y (y_mean).

The rectangles are comparing the value of x - x_mean and y - y_mean for each data point.

The rectangle is green when either:

  • both x and y are greater than their respective means
  • both x and y are less than their respective means

The rectangle is red when either:

  • x is greater than x_mean but y is less than y_mean
  • x is less than x_mean but y is greater than y_mean

Covariance (and correlation) can be both strongly negative and strongly positive. When the graph is dominated by one colour more than the other, it means that the data mostly follows a consistent pattern.

  • If the graph has lots more green than red, it means that y generally increases when x increases.
  • If the graph has lots more red than green, it means that y generally decreases when x increases.
  • If the graph isn't dominated by one colour or the other, it means that there isn't much of a pattern to how x and y relate to each other.

The actual value of the covariance for two different variables x and y, is basically the sum of all the green area minus all the red area, then divided by the total number of data points - effectively the average greenness-vs-redness of the graph.

How does that sound/look?

$\endgroup$
8
  • $\begingroup$ Just to make sure, the blue dot in the middle is the average of ALL the points? $\endgroup$
    – FafaDog
    Commented Jan 6, 2020 at 17:17
  • $\begingroup$ Yes that's correct. It's different in each of the 4 graphs as they are of different sets of data. $\endgroup$
    – capohugo
    Commented Jan 7, 2020 at 22:18
  • $\begingroup$ Thanks for this! One confusing thing about this is that you have sort of flipped (although not exactly) the color scheme from @whuber's answer. In trying to wrap my head around the concept, that gave some cognitive whiplash. $\endgroup$ Commented May 31, 2020 at 5:40
  • $\begingroup$ +1 I prefer your explanation, especially the second-to-last sentence, thank you so much $\endgroup$ Commented Jun 11, 2021 at 4:25
  • 2
    $\begingroup$ @Kirsten it's likely just a cultural background thing, but I found the idea of blue=negative / red=positive deeply unintuitive and jarring. I am much more used to red=negative in the context of displaying the sign of numbers, while blue & green are natural positive colours (e.g. traffic lights). I'm a native english speaker from Australia. I'm guessing people from different backgrounds would have different colour/number associations. $\endgroup$
    – capohugo
    Commented Mar 7, 2023 at 6:36
45
$\begingroup$

Covariance is a measure of how much one variable goes up when the other goes up.

$\endgroup$
9
  • 4
    $\begingroup$ Is it always in the 'same' direction? Also, does it apply for inverse relations too (i.e., as one goes up the other goes down)? $\endgroup$
    – PhD
    Commented Nov 8, 2011 at 2:07
  • 7
    $\begingroup$ @nupul Well, the opposite of "up" is "down" and the opposite of "positive" is "negative". I tried to give a one sentence answer. Yours is much more complete. Even your "how two variables change together" is more complete, but, I think, a little harder to understand. $\endgroup$
    – Peter Flom
    Commented Nov 8, 2011 at 11:37
  • 7
    $\begingroup$ That's right, Peter, which is why @naught101 made that comment: you description sounds like a rate of change, whose units will therefore be [units of one variable] / [units of the other variable] (if we interpret it like a derivative) or will just be [units of one variable] (if we interpret as a pure difference). Those are neither covariance (whose unit of measure is the product of the units for the two variables) nor correlation (which is unitless). $\endgroup$
    – whuber
    Commented Aug 13, 2013 at 20:15
  • 3
    $\begingroup$ @nbro Consider any concrete example: suppose you know the covariance of variables $X$ and $Y$ is $1,$ for instance. Even with the most generous understanding of "variable" and "go up," could you tell from that information alone how much $Y$ goes up when $X$ goes up by a given amount? The answer is no: the only information it gives you is that $Y$ would tend to increase. In this post Peter has confused the covariance with a regression coefficient (of which there are two, by the way, and they usually are different). $\endgroup$
    – whuber
    Commented Jun 25, 2019 at 19:47
  • 4
    $\begingroup$ @nbro Covariance is the second central moment of a bivariate random variable. And that returns us to the beginning: how would one convey this precise definition to the proverbial five-year-old? As always, there's a trade-off between economy of expression and accuracy: when the audience doesn't have the concepts or language needed to understand something immediately, somehow you have to weave in an explanation of that background along with your description. Doing it right requires some elaboration. Usually there's no shortcut. $\endgroup$
    – whuber
    Commented Jun 25, 2019 at 20:48
19
$\begingroup$

I am answering my own question, but I thought It'd be great for the people coming across this post to check out some of the explanations on this page.

I'm paraphrasing one of the very well articulated answers (by a user'Zhop'). I'm doing so in case if that site shuts down or the page gets taken down when someone eons from now accesses this post ;)

Covariance is a measure of how much two variables change together. Compare this to Variance, which is just the range over which one measure (or variable) varies.

In studying social patterns, you might hypothesize that wealthier people are likely to be more educated, so you'd try to see how closely measures of wealth and education stay together. You would use a measure of covariance to determine this.

...

I'm not sure what you mean when you ask how does it apply to statistics. It is one measure taught in many stats classes. Did you mean, when should you use it?

You use it when you want to see how much two or more variables change in relation to each other.

Think of people on a team. Look at how they vary in geographic location compared to each other. When the team is playing or practicing, the distance between individual members is very small and we would say they are in the same location. And when their location changes, it changes for all individuals together (say, travelling on a bus to a game). In this situation, we would say they have a high level of covariance. But when they aren't playing, then the covariance rate is likely to be pretty low, because they are all going to different places at different rates of speed.

So you can predict one team member's location, based on another team member's location when they are practicing or playing a game with a high degree of accuracy. The covariance measurement would be close to 1, I believe. But when they are not practicing or playing, you would have a much smaller chance of predicting one person's location, based on a team member's location. It would be close to zero, probably, although not zero, since sometimes team members will be friends, and might go places together on their own time.

However, if you randomly selected individuals in the United States, and tried to use one of them to predict the other's locations, you'd probably find the covariance was zero. In other words, there is absolutely no relation between one randomly selected person's location in the US, and another's.

Adding another one (by 'CatofGrey') that helps augment the intuition:

In probability theory and statistics, covariance is the measure of how much two random variables vary together (as distinct from variance, which measures how much a single variable varies).

If two variables tend to vary together (that is, when one of them is above its expected value, then the other variable tends to be above its expected value too), then the covariance between the two variables will be positive. On the other hand, if one of them is above its expected value and the other variable tends to be below its expected value, then the covariance between the two variables will be negative.

These two together have made me understand covariance as I've never understood it before! Simply amazing!!

$\endgroup$
2
  • 19
    $\begingroup$ Although these descriptions are qualitatively suggestive, sadly they are incomplete: they neither distinguish covariance from correlation (the first description appears to confuse the two, in fact), nor do they bring out the fundamental assumption of linear co-variation. Also, neither addresses the important aspect that covariance depends (linearly) on the scale of each variable. $\endgroup$
    – whuber
    Commented Nov 8, 2011 at 14:35
  • 1
    $\begingroup$ @whuber - agreed! And hence haven't marked mine as the answer :) (not as yet ;) $\endgroup$
    – PhD
    Commented Nov 9, 2011 at 0:45
17
$\begingroup$

I really like Whuber's answer, so I gathered some more resources. Covariance describes both how far the variables are spread out, and the nature of their relationship.

Covariance uses rectangles to describe how far away an observation is from the mean on a scatter graph:

  • If a rectangle has long sides and a high width or short sides and a short width, it provides evidence that the two variables move together.

  • If a rectangle has two sides that are relatively long for that variables, and two sides that are relatively short for the other variable, this observation provides evidence the variables do not move together very well.

  • If the rectangle is in the 2nd or 4th quadrant, then when one variable is greater than the mean, the other is less than the mean. An increase in one variable is associated with a decrease in the other.

I found a cool visualization of this at http://sciguides.com/guides/covariance/, It explains what covariance is if you just know the mean. link via the wayback machine

$\endgroup$
8
  • 8
    $\begingroup$ +1 Nice explanation (especially that introductory one-sentence summary). The link is interesting. Since it has no archive on the Wayback machine it likely is new. Because it so closely parallels my (three-year-old) answer, right down to the choice of red for positive and blue for negative relationships, I suspect it is an (unattributed) derivative of the material on this site. $\endgroup$
    – whuber
    Commented Aug 9, 2014 at 17:54
  • 7
    $\begingroup$ The "cool visualization" link has died... . $\endgroup$
    – whuber
    Commented Jan 4, 2017 at 18:02
  • 2
    $\begingroup$ @MSIS That's not possible to figure out, because there are a very great number of possible distributions on the circle. But if you are referring to the uniform distribution, there's nothing to calculate, because (as I recall remarking in your thread at stats.stackexchange.com/q/414365/919) the correlation coefficient must equal its own negative, QED. $\endgroup$
    – whuber
    Commented Jun 25, 2019 at 19:14
  • 2
    $\begingroup$ @MSIS If "method" means "an appeal to symmetry," the answer is that it will work but the result depends on how $X$ is distributed. As an example, if $X$ is a random variable with a distribution symmetric about $0$ with finite fourth moment, then $X$ and $X^2$ must be uncorrelated. As a non-example, if $X$ has a distribution symmetric about $1,$ then nothing general can be said about the correlation of $X$ and $X^2:$ indeed, it could be any value between $-1$ and $1$ inclusive. $\endgroup$
    – whuber
    Commented Jun 25, 2019 at 19:23
  • 2
    $\begingroup$ @MSIS Usually, in the absence of an explicit distribution, and almost always in a purely mathematical context, one assumes that a uniform distribution is meant. In the case of a geometrical circle parameterized by an angle $\alpha,$ the basic events are of the form $a\lt\alpha\le b$ and their probabilities equal $((b-a) \operatorname{mod} 2\pi)/(2\pi).$ $\endgroup$
    – whuber
    Commented Jun 25, 2019 at 20:53
17
$\begingroup$

Here's another attempt to explain covariance with a picture. Every panel in the picture below contains 50 points simulated from a bivariate Normal distribution with correlation between x & y of 0.8 and variances as shown in the row and column labels. The covariance is shown in the lower-right corner of each panel.

Different covariances, all with correlation = 0.8

Anyone interested in improving this...here's the R code:

library(mvtnorm)

rowvars <- colvars <- c(10,20,30,40,50)

all <- NULL
for(i in 1:length(colvars)){
  colvar <- colvars[i]
  for(j in 1:length(rowvars)){
    set.seed(303)  # Put seed here to show same data in each panel
    rowvar <- rowvars[j]
    # Simulate 50 points, corr=0.8
    sig <- matrix(c(rowvar, .8*sqrt(rowvar)*sqrt(colvar), .8*sqrt(rowvar)*sqrt(colvar), colvar), nrow=2)
    yy <- rmvnorm(50, mean=c(0,0), sig)
    dati <- data.frame(i=i, j=j, colvar=colvar, rowvar=rowvar, covar=.8*sqrt(rowvar)*sqrt(colvar), yy)
    all <- rbind(all, dati)
  }
}
names(all) <- c('i','j','colvar','rowvar','covar','x','y')
all <- transform(all, colvar=factor(colvar), rowvar=factor(rowvar))
library(latticeExtra)
useOuterStrips(xyplot(y~x|colvar*rowvar, all, cov=all$covar,
                      panel=function(x,y,subscripts, cov,...){
                        panel.xyplot(x,y,...)
                        print(cor(x,y))
                        ltext(14,-12, round(cov[subscripts][1],0))
                      }))
$\endgroup$
7
$\begingroup$

I would simply explain correlation which is pretty intuitive. I would say "Correlation measures the strength of relationship between two variables X and Y. Correlation is between -1 and 1 and will be close to 1 in absolute value when the relationship is strong. Covariance is just the correlation multiplied by the standard deviations of the two variables. So while correlation is dimensionless, covariance is in the product of the units for variable X and variable Y.

$\endgroup$
1
  • 11
    $\begingroup$ This seems inadequate because there is no mention of linearity. X and Y could have a strong quadratic relationship but have a correlation of zero. $\endgroup$
    – mark999
    Commented May 7, 2012 at 0:46
5
$\begingroup$

Variance is the degree by which a random vairable changes with respect to its expected value Owing to the stochastic nature of be underlying process the random variable represents.

Covariance is the degree by which two different random variables change with respect to each other. This could happen when random variables are driven by the same underlying process, or derivatives thereof. Either processes represented by these random variables are affecting each other, or it's the same process but one of the random variables is derived from the other.

$\endgroup$
2
$\begingroup$

Two variables that would have a high positive covariance (correlation) would be the number of people in a room, and the number of fingers that are in the room. (As the number of people increases, we expect the number of fingers to increase as well.)

Something that might have a negative covariance (correlation) would be a person's age, and the number of hair follicles on their head. Or, the number of zits on a person's face (in a certain age group), and how many dates they have in a week. We expect people with more years to have less hair, and people with more acne to have less dates.. These are negatively correlated.

$\endgroup$
2
  • 5
    $\begingroup$ Covariance is not necessarily interchangeable with correlation - the former is very unit dependent. Correlation is a number between -1 and 1 a unit-less scalar representing the 'strength' of the covariance IMO and that's not clear from your answer $\endgroup$
    – PhD
    Commented Nov 9, 2011 at 18:24
  • 2
    $\begingroup$ Downvoted as the answer implies that covariance and correlation can be used interchangeably. $\endgroup$ Commented Apr 7, 2018 at 16:20
-1
$\begingroup$

Covariance is a statistical measure that describes the relationship between two variables. If two variables have a positive covariance, it means that they tend to increase or decrease together. If they have a negative covariance, it means that they tend to move in opposite directions. If they have a covariance of zero, it means that they are independent and do not affect each other.

To explain covariance to someone who understands only the mean, you could start by explaining that the mean is a measure of the central tendency of a distribution. The mean tells you the average value of a set of numbers.

Covariance, on the other hand, measures how two variables vary together. It tells you whether they tend to increase or decrease together, or whether they move in opposite directions.

For example, suppose you have two sets of numbers, X and Y. The mean of X tells you the average value of X, and the mean of Y tells you the average value of Y. If the covariance between X and Y is positive, it means that when X is above its mean, Y tends to be above its mean as well. And when X is below its mean, Y tends to be below its mean as well. If the covariance is negative, it means that when X is above its mean, Y tends to be below its mean, and vice versa. If the covariance is zero, it means that there is no relationship between X and Y.

So, in summary, covariance measures the tendency of two variables to vary together, and can be positive, negative, or zero.

$\endgroup$
1
  • $\begingroup$ Many readers will leave this post more confused than ever, because zero covariance does not imply independence; distributions are not mere "sets of numbers," and the explanation in terms of "tends to"--although tempting--is not always correct, because covariance concerns arithmetic means rather than proportions. $\endgroup$
    – whuber
    Commented May 12 at 17:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.