Standard deviation of mean of a set of numbers, which are imprecise

Question

I have a problem which seems very simple, but for some reason I can not find out what I have to do exactly.

Let's say I have a set of derived values, where each of them has an individual error: $$X_{all}=(x_1 \pm \sigma_{x_1}, x_2 \pm \sigma_{x_2}, ..., x_n \pm \sigma_{x_n})$$ (where $\sigma_i$ stands for the standard deviation).

Now I want to have the average value of $X_{all}$ and some confidence of that value. The average is of course: $X_{avg}=\frac{1}{N}\sum x_i$

But for the standard deviation of $X_{avg}$, I dont know what I should use. There are two possibilities that seem neccesary:

1) From error-propagation: $$\sigma_{X_{avg}} = \sqrt{\frac{1}{N} \sum_i \sigma_{x_i}^2}$$

2) Normal standard-deviation when creating a mean value: $$\sigma_{X_{avg}} = \sqrt{\frac{1}{N} \sum_i (x_i-X_{avg})^2}$$

Concrete Example:

I perform a measurement to get the value $X$. In order to get a statistical significant knowlegde about $X$ and its standard deviation, I perform the measurement n times, leading to the results $x_i$.

However, I can not measure $x_i$ directly, but only $y_i=x_i+BG$, where BG is a background value. For each measurement of $y_i$, I automatically get the information about $BG$ 1000 times, which gives me $BG_{avg}$ and $\sigma_{BG}$ for each $y_i$ ($BG$ is gauss distributed). Now I have $x_i = y_i - BG_{avg}$, thus I get a $\sigma_{x_i}=\sigma_{BG}$.

Other Concrete Example:

I want to know the average of the lap-time of a racing car. I measure 100 laps. However, I know my clock has an uncertainty $\sigma_{clock}$. Moreover, for some reason I use for each lap a different clock with a different uncertainty $\sigma_{clock_i}$.

So I get 100 times $t_i$ for the time in lap $i$, with an uncertainty of $\sigma_{clock_i}$, corresponding to the uncertainty I introduce due to the clock itself.

What is my uncertainty of the average lap-time of the racing car?

Are all the $x_i$s from the same distribution. It sounds like they are not, In which case the mean would involve the $\sigma_i$, i.e. some for of weighted average. — user121049, Commented Mar 28, 2015 at 20:16
@user121049 Thanks for the reply. I added a concrete example as I don't fully understand your question. As you see in the example, $x_i$ are the same things, just n different independent measurement results. — Mario Krenn, Commented Mar 29, 2015 at 10:57
So you have n*1000 measurements. What's the rational for dividing the measurements up into n groups? Could you have done this different or is this forced upon you? — user121049, Commented Mar 30, 2015 at 8:27
@user121049 each $x_i$ I only measure n times, but the background is measured simultaniously - for each $x_i$ measurement, 1000 times. That was a technical "limitation" or "feature", however you want to call it. So, in fact, i only have n measurements of $x_i$. — Mario Krenn, Commented Mar 30, 2015 at 12:12

Glen O · Accepted Answer · 2015-04-01 02:09:51Z

5

+25

You have defined two distinct standard deviations, describing different things.

The first one describes the standard deviation of the average itself - that is, a measure of how accurately we know the average, ignoring the spread of the set of measurements.

The second one describes the standard deviation of the set of measurements, ignoring the confidence of each measurement.

I assume that what you seek is the standard deviation of the set of all possible sets of measurements, where each measurement has a distribution described by the individual mean and standard deviation. That's a slightly more complicated problem.

Recall that the variance is defined as $$ \text{Var}(X) = E(X^2)-E(X)^2 $$ Now, if $$ X=\frac1N\sum_{i=1}^N X_i $$ where each $X_i\sim \mathcal{N}(x_i,\sigma_i)$, then $E(X)$ is just the average of the $x_i$ values. However, $E(X^2)$ is the average of the expected values of $X_i^2$. And so, we have $$ E(X_i^2) = \text{Var}(X_i)+E(X_i)^2 = \sigma_i^2+x_i^2 $$ and the average value is the sum of the average variance and the average of the $x_i^2$ values.

From here, it is easy to see that the final variance is quite simply the sum of the average of the measurement variances and the variance in the measurement values. That is, taking the square root to get the final standard deviation, $$ \sigma = \sqrt{\frac1N\left(\sum_i \left[\sigma_i^2+(x_i-\mu)^2\right]\right)} $$ where $\mu=\frac1N \sum_i x_i$.

answered Apr 1, 2015 at 2:09

Glen O

12.5k31 silver badges39 bronze badges

$\begingroup$ Thanks alot for the answer. Please help me grasp it fully: "However, $E(X^2)$ is the average of the expected values of $X_i^2$", why can you replace $X^2$ by $X_i^2$?. Then I dont get how to combine your findings, thus I dont understand "From here, it is easy to see that the final variance is quite simply". Furthermore, I'm confused my uncertainty $\sigma_{x_i}$ actually broaden the distribution of the my $x_i$, thus directly broaden $\sigma_{X_avg}$, why do I have to take into account $\sigma_{x_i}$ into the final $\sigma$ explicitly? (I expected it but dont understand it) $\endgroup$
– Mario Krenn
Commented Apr 2, 2015 at 4:24
$\begingroup$ And could you please tell me here the connection to en.wikipedia.org/wiki/… If all $\sigma_{x_i}=0$, then soakley provided the answer. But what happens in your case? Is it still $\sigma_{mean}=\frac{\sigma}{\sqrt{N}}$? Thank you very much for your help and time! $\endgroup$
– Mario Krenn
Commented Apr 2, 2015 at 4:26
$\begingroup$ I wasn't saying that that property let me replace $X^2$ with $X_i^2$, I was saying that we have $E(X^2) = \frac1N \sum_i E(X_i^2)$, and so we needed to find $E(X_i^2)$... then provided how to find it. As for standard error of the mean, I'll admit to not knowing with certainty, but I believe what you have written is correct for this situation - the standard deviation of the mean is the standard deviation of the variable itself divided by the square root of the number of observations... here, $N$ is the number of random variables, as they are our observations. But I'm no expert on this. $\endgroup$
– Glen O
Commented Apr 2, 2015 at 5:44
$\begingroup$ As for the question on why the uncertainty has to be accounted for directly, it's because the "direct" calculation of standard deviation only accounts for the deviations between the $x_i$ "mean" values, not the deviations of those values. As such, the second part incorporates the deviations. Suppose you have two methods of measuring an object's length. One gives a length of 1m with a s.d. of 0.1m, and the other gives a length of 0.95m with a s.d. of 0.08m. To work out total standard deviation, you have to factor in both the deviation between 0.1 and 0.95, and the combination of s.d. values. $\endgroup$
– Glen O
Commented Apr 2, 2015 at 5:54

Add a comment |

soakley · Accepted Answer · 2015-03-30 01:54:19Z

2

I think you want the first solution, but you don't have it quite right. With the definition $$\bar x=X_{avg} ={{1} \over {N}} \sum_{i=1}^n x_i$$ we will have variance given by $$\sigma^2_{\bar x}={{1} \over {N^2}} \sum \sigma^2_{x_i}$$ So the standard deviation is $$\sigma_{\bar x} = {{1} \over {N}} \sqrt {\sum_i \sigma^2_{x_i}}$$ The reasoning is just the definition of variance and the independence of the random variables being summed.

answered Mar 30, 2015 at 1:54

soakley

1,5722 gold badges16 silver badges17 bronze badges

$\begingroup$ I'm not sure whether that can be correct; imagine the case where all $\sigma_{x_i}=0$. With your resoning, we would get $\sigma_{\bar x}=0$. However, I still have the uncertainty from statistics, which is given by my first formular (en.wikipedia.org/wiki/…). What do you think? $\endgroup$
– Mario Krenn
Commented Mar 30, 2015 at 12:21
1

$\begingroup$ Ah, I think I see your point. You need some way to combine the measurement error and what you might call the statistical uncertainty. $\endgroup$
– soakley
Commented Mar 30, 2015 at 17:37

Add a comment |

Hagen von Eitzen · Accepted Answer · 2015-04-06 20:48:30Z

The exact answer may depend. If your $x_i$ are supposed to be identically distributed (and the $\sigma_i$ are estimates of the standard derivation of the underlying distribution), then the second equation may be useful (though more precisely with $\frac 1{N-1}$ in place of $\frac 1N$); it determines a better estimation of the standard deviationfrom the observations. Example: You make repeated measurements of lap times of the same racing car and want to determine the avearage lap time of that car.

If on the other hand it makes little sense to view the $x_i$ do not follow the same distributon. Example: In a family one parent makes $5000\pm 10\$$ a month, another parent make $2000\pm 50\$$ and the three kids make $0\$$ a month each. Then the income of a randomly picked familiy member is $1400\$$ with a very large variance (per second formula); but the average income per family membre is quite precisely $1400\$$ (per first formula, error propagation).

You will notice a difference between the two approaches only in those cases where the $\sigma_i$ are much smaller and known from other sources (such as measurement or quantification errors) than the random nature of the underlying process ...

That is very instructive, thank you. My example is compareable with the measurement of the average lap time. Let's say I have 100 laps, and measure the time of each, and calculate the average and the standard deviation with my formular 2, everything is fine. Now I can not measure time 100% accurate, thus I get an error in the time-measurement itself, a $\sigma_{time}$. How would I treat it? A) Still use formular 2, because uncertainty in time-measurement only increases the uncertainty of the average lap? B) Use the formular provided by Glen O.? — Mario Krenn, Commented Apr 7, 2015 at 13:07
Or maybe let's make the example even more clear: I want to measure the average lap time. I measure the 100 laps, each lap I measure with a different clock, and I know the error of the clock $\sigma_{clock_i}$. So I have 100 $t_{lap_i}$ with corresponding 100 $\sigma_{clock_i}$. What is the standard deviation of my final average? — Mario Krenn, Commented Apr 7, 2015 at 23:03

Stack Exchange Network

Standard deviation of mean of a set of numbers, which are imprecise

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
stochastic-processes
.

Hot Network Questions

Standard deviation of mean of a set of numbers, which are imprecise

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statisticsstochastic-processes.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
stochastic-processes
.