Sample Bias of a Statistic - Stuck on Definition and Formula

Question

I'm doing an introductory statistics course (a subject I'm very new to) at university, and the notes in my chapter on simple random sampling give the following statement:

"Definition - If statistic S estimates a population parameter θ, then the bias of S is: b(S,θ) = E[S] - θ"

I'm having trouble understanding what 'bias' is in this context of statistics (which I return to at the end of this post), so a better explanation on it than the above statement would really be great; but in that statement, I'm stuck on understanding E[S].

Now, obviously a variable like 'height' can take on a distribution of values, and statistics on that variable like mean, variance etc can be calculated. I've been introduced to the function E[X] (similar to E[S] above) aka Expected Value, but was told it specifically means the 'mean average' of the many raw sample values. But here it appears to be saying you can calculate a single statistic of a sample, then work out the Expected Value of that statistic? But that doesn't seem to make sense to me - how can you calculate the mean of a single value (e.g. the particular sample's variance) - the mean in such a trivial case is just the single value?

The only two things I can think of are either: 1) I've missed the point of what it's trying to say entirely. And/Or 2) That when they are calculating E[S] they are actually calculating it from a distribution of S values - i.e., they take many different randomly selected samples from a population, calculate the S from each, tabulate that, and then work out the mean aka Expected Value of S from this distribution?

In general, I've tried to make sense of what the authors mean by 'bias' in this context but have struggled with theirs and other explanations, and the motivations for the above formula. Following from 2 just above, as I understand it then, if I have a population A and I take samples of size n I can calculate a statistic S from that sample, and I can calculate a parameter θ of the whole population at once (if I happen to know all its values). Since each sample is random, then S may vary slightly from sample to sample. If though I were to take every possible combination of n population members (which would be many, many samples), calculate the S for each, and then calculate the Expected Value of all those S's; if then E[S] - θ = zero, the statistic and my sampling procedure would be unbiased. This would be because, when accounting for every possible way a sample's S might differ from θ, and then averaging out those differences, the difference between my sample values and the whole pop's value is zero, and so in a sense my sampling procedure has accurately captured the shape of the whole population, and each slice (sample) can thus be thought of as being an accurate reflection of the population?

Apologies that this was really long for something probably super basic, but the notes I'm using for self-teaching this course really don't make things like this clear, especially for a beginner to stats like me.

Many thanks, indeed!

The answer presented in this question might help: math.stackexchange.com/questions/1982466/… — Green, Commented Dec 1, 2017 at 19:28

callculus42 · Accepted Answer · 2017-12-01 19:53:57Z

0

Following from 2 just above, as I understand it then, if I have a population A and I take samples of size n I can calculate a statistic S from that sample, and I can calculate a parameter θ of the whole population at once (if I happen to know all its values)

In general you assume a specific distribution from where the sample is taken, but you don´t know the value of the mean. Then you look for an unbiased estimator for the mean. Let´s say the population is discrete uniformly distributed.

$f(x)=\begin{cases} \frac1n \ \forall \ \ x=\{1,2, \ldots ,n \} \\ 0, \text{elsewhere} \end{cases}$

What is an unbiased estimator for the expected value?

It is unbiased if Bias$=E(\hat\mu)-\mu=E(\hat\mu)-\frac{n+1}{2}=0$

Here $E(X)=\mu$. Now you investigate if $\hat\mu=\frac{\sum\limits_{i=1}^n x_i}{n}$ is an umbiased estimator.

$E(\hat\mu)-\frac{n+1}{2}=E\left( \frac{\sum\limits_{i=1}^n X_i}{n}\right)-\frac{n+1}{2}$

$\frac{1}{n}E\left( \sum\limits_{i=1}^n X_i\right)-\frac{n+1}{2}$

$\frac{1}{n}\left(E\left( X_1\right)+E\left( X_2\right)+\ldots +E\left( X_n\right)\right)-\frac{n+1}{2}$

Since all $X_i$ are uniformly distributed we have $E(X_i)=\mu \ \forall \ \ i=\{1,2, \ldots ,n \}$

$\frac{1}{n}\left(\underbrace{\frac{n+1}{2}+\frac{n+1}{2}+\ldots +\frac{n+1}{2}}_{n-times}\right)-\frac{n+1}{2}$

$\frac{1}{n}\left(n\cdot \frac{n+1}{2}\right)-\frac{n+1}{2}=\frac{n+1}{2}-\frac{n+1}{2}=0$

Thus $\hat\mu$ is an unbiased estimator for the expected value, if the population is uniformly distributed.

answered Dec 1, 2017 at 19:53

callculus42

30.7k4 gold badges27 silver badges44 bronze badges

$\begingroup$ Thanks, unfortunately I'm very new to this so a lot of that I think, like normal distribution and such went over my head, but I'm trying to interpret your work: I understand that E(X) is the expected value of the whole population, which = mean of pop. But when you do 1/n * (E(X1) + E(X2) +.... E(Xn)) what are the E(X1), E(X2) etc, specifically? Are you taking the expected value of each single member of the population (which would seem strange), or are you taking n different samples from the pop (which I'm assuming would be the maximum number of samples with unique combinations of pop members) $\endgroup$
– TheRealPaulMcCartney
Commented Dec 1, 2017 at 20:41
$\begingroup$ (pt 2) and then taking the expected value of each of those n samples? $\endgroup$
– TheRealPaulMcCartney
Commented Dec 1, 2017 at 20:42
$\begingroup$ @SuperDeliciousCake The $X_i´s$ is the random variable for a sample of $n=1$. The $X_i´s$ are distributed uniformly with $E(X)=\frac{n+1}2$.Let´s say you have n numbered balls from $1$ to $n$ in a urn. Then you take $ m=3$ samples of where $m$ is smaller than $n$, which is the purpose of sampling.. Let´s assume the results are $X_1=1, X_2=4, X_3=4$ That means that the unbiased estimator for $E(X)=(1+4+4)/3=3=$. This is equal to $\frac{n+1}2$. Solving for n results in $n=5(m<n)$ So you take $n$ samples with replacement of size one. $\endgroup$
– callculus42
Commented Dec 1, 2017 at 21:27
$\begingroup$ @SuperDeliciousCake Each sample of the n samples is a random variable $X_i$, which is distributed uniformly. $\endgroup$
– callculus42
Commented Dec 1, 2017 at 21:29

Add a comment |

Stack Exchange Network

Sample Bias of a Statistic - Stuck on Definition and Formula

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
.

Linked

Hot Network Questions

Sample Bias of a Statistic - Stuck on Definition and Formula

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statistics.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
.