I'm doing an introductory statistics course (a subject I'm very new to) at university, and the notes in my chapter on simple random sampling give the following statement:
"Definition - If statistic S estimates a population parameter θ, then the bias of S is: b(S,θ) = E[S] - θ"
I'm having trouble understanding what 'bias' is in this context of statistics (which I return to at the end of this post), so a better explanation on it than the above statement would really be great; but in that statement, I'm stuck on understanding E[S].
Now, obviously a variable like 'height' can take on a distribution of values, and statistics on that variable like mean, variance etc can be calculated. I've been introduced to the function E[X] (similar to E[S] above) aka Expected Value, but was told it specifically means the 'mean average' of the many raw sample values. But here it appears to be saying you can calculate a single statistic of a sample, then work out the Expected Value of that statistic? But that doesn't seem to make sense to me - how can you calculate the mean of a single value (e.g. the particular sample's variance) - the mean in such a trivial case is just the single value?
The only two things I can think of are either: 1) I've missed the point of what it's trying to say entirely. And/Or 2) That when they are calculating E[S] they are actually calculating it from a distribution of S values - i.e., they take many different randomly selected samples from a population, calculate the S from each, tabulate that, and then work out the mean aka Expected Value of S from this distribution?
In general, I've tried to make sense of what the authors mean by 'bias' in this context but have struggled with theirs and other explanations, and the motivations for the above formula. Following from 2 just above, as I understand it then, if I have a population A and I take samples of size n I can calculate a statistic S from that sample, and I can calculate a parameter θ of the whole population at once (if I happen to know all its values). Since each sample is random, then S may vary slightly from sample to sample. If though I were to take every possible combination of n population members (which would be many, many samples), calculate the S for each, and then calculate the Expected Value of all those S's; if then E[S] - θ = zero, the statistic and my sampling procedure would be unbiased. This would be because, when accounting for every possible way a sample's S might differ from θ, and then averaging out those differences, the difference between my sample values and the whole pop's value is zero, and so in a sense my sampling procedure has accurately captured the shape of the whole population, and each slice (sample) can thus be thought of as being an accurate reflection of the population?
Apologies that this was really long for something probably super basic, but the notes I'm using for self-teaching this course really don't make things like this clear, especially for a beginner to stats like me.
Many thanks, indeed!