6
$\begingroup$

I'm having trouble imagining what variance and deviation mean with a series of die rolls. That is, a fair die will fall with a flat distribution on all its values 1-6 in 6 bins (1, 2, 3, 4, 5, 6) over time (as n goes towards infinity).

Firstly, does the concept of variance really make sense on such a question? [Edit: only if I provide some data on bin outcomes. Say n=36, and the die lands as follows: 1 (6 times), 2 (5x), 3 (5x), 4 (7x), 5 (7x), 6 (6x).]

The average outcome will be n/6 over time for each of the six bins [Edit: My prior writeup was confusing, as I had said the mean was 3.5 -- but this mean face-value is irrelevant to the question.]

Is this question even valid? It seems a perfectly flat distribution (as n-> infinity), with no other hidden variables, has no variance (or shouldn't have any), but then what should one make of the results when n is finite?

$\endgroup$
10
  • 2
    $\begingroup$ This is a common routine-textbook-style question asked of students; as such it should probably be marked self-study; please see its tag wiki $\endgroup$
    – Glen_b
    Commented Feb 23, 2016 at 5:26
  • 4
    $\begingroup$ "Dice" is plural; "die" is singular. From the wikipedia page on the discrete uniform you can see that the variance for a discrete uniform on $1,...,k$ is $(k^2-1)/12 = (k+1)(k-1)/12$. When $k=6$, that's $35/12$. $\endgroup$
    – Glen_b
    Commented Feb 23, 2016 at 5:29
  • 1
    $\begingroup$ @Glen_b actually Glen, I think it shows the weakness of the equations for variance--that there's no real model to make it less arbitrary. To say that the variance is 2.916 when it's a fair die who's mean will always center around 3.5, who range is 1-6, and whose probability distribution is totally flat makes the result seem to some out of NOWHERE. $\endgroup$
    – Marcos
    Commented Feb 23, 2016 at 15:24
  • 5
    $\begingroup$ I'm sorry Marcos, I'm missing your point there; you may need to clarify what you see as problematic. i) Of course there's a model; the discrete uniform on 1,2,...,6. The calculation comes directly from the definition of variance of a random variable: $\text{Var}(X) = \sum_{i=1}^6 (i-\mu) . p(i)$ (where here $\mu$ is 3.5). The result doesn't "come out of nowhere", it's direct calculation from the definition. But the result for the general case (any number of faces, not just 6) is so simple that we can compute the general case... $\endgroup$
    – Glen_b
    Commented Feb 23, 2016 at 20:37
  • 2
    $\begingroup$ ... and indeed, the even more general case (faces labeled not from 1 up, but from $a$ up) is also very easy, and is already available to be looked up on wikipedia. If you think it's "from nowhere" that can be solved by calculating it from the definition. $\endgroup$
    – Glen_b
    Commented Feb 23, 2016 at 20:39

4 Answers 4

16
$\begingroup$

While @dsaxton's answer is correct, I think it makes it more difficult for beginners in statistics to grasp the concept of variance, so I'll offer another answer that helps you get a better "feel" for the what the variance is actually "doing." An equivalent expression for the variance in this case is:

$Var(X)$ =$ \sum_{i=1}^6(X_i-\bar{X})^2\over{6}$.

Now, you know the mean, $\bar{X}=3.5$, so you simply need to take the die's $i$th's face value $i=1, 2, . . . , 6$, $X_i$ and subtract it from the mean, square it, and divide it by 6. In effect this gives you an average of how far away each die value is from its mean. So $Var(X)$ is given by:

${(1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2}\over{6}$= $17.5\over{6}$=$105/36$, the same answer @dsaxton provided.

We square the values of $X_i-\bar{X}$ because if we don't, then the sum of the values will add to zero and the negative numbers will cancel out the positive numbers.

$\endgroup$
13
  • 1
    $\begingroup$ Interesting. Something seems arbitrary to me about it. The result itself doesn't make much sense: I know that with 100 rolls, the results will be centered around the mean and in a flat curve. So to say that the results vary by 2.916 appears without any explanation of theoretical merit. $\endgroup$
    – Marcos
    Commented Feb 23, 2016 at 15:22
  • 2
    $\begingroup$ @Marcos What doesn't make sense about it? It's just the definition of variance. The sample mean of $100$ rolls will tend to be close to the distribution mean, but the variance is also reduced by a factor of $100$ so there's no issue. $\endgroup$
    – dsaxton
    Commented Feb 23, 2016 at 15:40
  • 3
    $\begingroup$ @Marcos, the result is basically saying, that on average, the "squared distance between each value and the "center" of values is about 2.916. There are other measures of variability that are used too, such as the mean absolute deviation which is calculated the same way, only instead of squaring the result the absolute value between each value and the mean is taken. The variance has some important properties too. For example, knowing it, you can calculate bounds on the the probability that a random variable is within a given distance of the mean, regardless of the shape of the distribution. $\endgroup$ Commented Feb 23, 2016 at 17:57
  • 1
    $\begingroup$ You're confused -- the height of the probability function is flat within its bounds, but it's not the probability at each outcome we calculate the variability of, it's the way the distributions of the outcomes vary. [You're seeing (1/6,1/6,1/6...) and thinking those don't vary but that's not the thing we're looking at varying. It's the distribution of outcomes --- the values (1,2...,6) all coming up equally often. $\endgroup$
    – Glen_b
    Commented Dec 6, 2016 at 6:18
  • 2
    $\begingroup$ 1. don't confuse the outcomes (1 doesn't become more nearly equal to 6 as you increase the number of tosses -- 6-1=5 every time) with their frequency. 2. You're still misunderstanding what variance is. It isn't about comparing the heights of the pmf - i.e. probabilities (or even counts). That is something other than the variance of the original random variable, the variation in the outcomes the variable can take. 3. You're now also mistaken in thinking that in the limit as you toss a fair die repeatedly that the counts will tend to become more equal. ... $\endgroup$
    – Glen_b
    Commented Dec 8, 2016 at 1:49
7
$\begingroup$

If $X$ is the value of the die we already know $\text{E}(X) = 21 / 6$ so we only need to find $\text{E}(X^2)$ since $\text{Var}(X) = \text{E}(X^2) - \text{E}(X)^2$. We can just directly calculate

\begin{align} \text{E}(X^2) &= \sum_{k=1}^{6} \frac{k^2}{6} \\ &= \frac{1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2}{6} \\ &= \frac{91}{6} \end{align}

which after some arithmetic gives us $\text{Var}(X) = 105 / 36$.

$\endgroup$
3
  • $\begingroup$ Thanks for the answer, but see my comment to StatsStudent below. $\endgroup$
    – Marcos
    Commented Feb 23, 2016 at 15:22
  • $\begingroup$ That doesn't make sense: if the die starts at zero-based numbering, then the variance changes. But, variance shouldn't depend on the numbering of the die as all faces are equally probable. $\endgroup$
    – Marcos
    Commented Dec 7, 2016 at 20:38
  • 1
    $\begingroup$ My answer assumes we're talking about the kind of die we see in the real world, and I don't see how it doesn't make sense. If we encounter one that is zero based then the first and second moments change accordingly so that the variance will stay the same. If it makes you feel better apply the standard definition for the variance of a discrete uniform distribution. $\endgroup$
    – dsaxton
    Commented Dec 7, 2016 at 22:27
6
$\begingroup$

This is a discrete uniform distribution. So we can use $\frac{(b-a+1)^2-1}{12}$ to solve for the variance. $\frac{(6-1+1)^2-1}{12}$ = $\frac{6^2-1}{12}$ = $\frac{35}{12}$

$\endgroup$
6
$\begingroup$

There are already several good answers posted (as well as one in the comments). My goal here is not to replicate those answers, but rather to try and address an apparent confusion about the "definition of variance".

In your question you say

It seems the variance and standard deviation tacitly ASSUME an a priori normal distribution around an unspecified or unknown order -- but a flat "curve" with no other hidden variables has no variance.

And in the answer you posted, you say

The answer should be (ahem: is) 0. Apparently the equations for variance assume another unknown variable (another dimension) affecting results.

If we call the value of a die roll $x$, then the random variable $x$ will have a discrete uniform distribution. That is, if we denote the probability mass function (PMF) of $x$ by $p[k]\equiv\Pr[x=k]$, then we have $p[k]=\frac{1}{K}$, where $K$ is the number of distinct values $k$ can take (i.e. here $K=6$).

Independent of the form of the probability distribution, the mean $\mu$ and variance $\sigma^2$ are always defined in terms of expectations. These definitions are $$ \mu_x\equiv\mathbb{E}[x] \,,\, \sigma^2_x\equiv\mathbb{E}\left[(x-\mu_x)^2\right] $$ (e.g. see Wikipedia).

For a discrete random variable such as $x\in\{X_1,\ldots,X_K\}$ with PMF $p[X_k]\equiv\Pr[x=X_k]$, the expectation operator $\mathbb{E}[\,]$ is defined by $$ \mathbb{E}\big[f[x]\big]\equiv\sum_{k=1}^Kf[X_k]p[X_k] $$ where $f[\,]$ is any deterministic function.

Your confusion appears to be related to this last part. For the mean $\mu$ you appear to be correctly using $f[x]=x$. However, for the variance you appear to be using $f[x]=p[x]$, i.e. the PMF of $x$.

Perhaps the following summary will make things more clear

\begin{array} {c|c|c} \text{object }(f) & \text{mean }(\mu_f) & \text{variance }(\sigma_f^2) \\ \hline x & \frac{7}{2} & \frac{105}{36} \\ p[x] = \frac{1}{6} & \frac{1}{6} & 0 \end{array}

In other words, the probability distribution $p[x]$ has zero variance, but the die value $x$ certainly has non-zero variance.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.