Expected value of Hypergeometric distribution using indicator random variables

Question

I am trying to understand how to calculate the expected value of a hypergeometric variable using indicator random variables. The derivation that I read in the book (Introduction to Probability Theory, Hoel Port Stone) is as follows:

Assume the population size to be $r$, of which $r_1$ are of type 1 and $r-r_1$ are of type 2. A sample of size $n$ is drawn without replacement from this population. Let $X_1, X_2, ... X_n$ be indicator random variables where $X_i = 1$ if and only if the ith element in the sample is of type 1. Then,

$E[X_i] = P(X_i = 1) = \frac{r_1}{r}$

I don't understand how the expectation of $X_i$ is the same $\forall i$. Since sampling is done without replacement in hypergeometric distribution, the probability of ith element in the sample being type 1 shouldn't be the same $\forall i$.

Can someone explain why this is true?

Edit: We can write,

$P(X_i=1) = \sum_{x_1}\sum_{x_2}...\sum_{x_{i-1}} P(X_1=x_1, X_2=x_2, ... , X_{i-1} = x_{i-1}, X_i=1)$

where $x_i$'s take values $0$ or $1$.

Can we compute this sum to show $P(X_i=1) = \frac{r_1}{r}?$

true blue anil · Accepted Answer · 2022-06-15 15:45:37Z

1

Since types don't have any preference for positions, the probability that type $1$ is at any position will be the same as its being at the start,

i.e. $\Bbb P(X_i) = \Bbb P(X_1) = \frac {r_1}{r}$

Now the expectation of an indicator random variable is just the probability of the event it indicates, thus $\Bbb E[X_i ] = \frac{r_i}{r},$

and by linearity of expectation, which applies even when the variables are not independent, we can get the final expectation as $\Bbb E[X] = \Sigma \Bbb E[X_i]$

edited Jun 15, 2022 at 15:45

answered Jun 15, 2022 at 14:30

true blue anil

42.8k4 gold badges29 silver badges50 bronze badges

$\begingroup$ suppose the sample of size n is drawn one by one, then, at the ith draw, the population left is r - i + 1. The probability that ith draw will be of type 1 is still r1/r? $\endgroup$
– dumbguywithmathsmajor
Commented Jun 15, 2022 at 17:23
$\begingroup$ The whole point in using the expectation route is to find the unconditional probability with no a priori information of picking a type $1$, and use linearity of expectation to get $E[X]$. $\endgroup$
– true blue anil
Commented Jun 15, 2022 at 18:10
$\begingroup$ I know this is correct, but I don't get the logic correctly. The fact that Xi denotes the ith sample implies we already know that i-1 samples have been drawn, right? $\endgroup$
– dumbguywithmathsmajor
Commented Jun 16, 2022 at 3:30
$\begingroup$ Yes, however, it does not denote that we know what they are. $\endgroup$
– Graham Kemp
Commented Jun 16, 2022 at 5:04
$\begingroup$ Suppose there is one type $1$ in $10$. P($1st$ is type $1)$ = $\frac1{10}.$ P($2nd$ is type $1$) = $\frac9{10}\frac19 = \frac1{10}$, etc right upto P($10th$ is type $1$) Then can you see that if, instead there were $3$ type $1's$ in $10$, the probability that a type $1$ is found in any position will be $\frac{3}{10} ?$ $\endgroup$
– true blue anil
Commented Jun 16, 2022 at 6:00

| Show 5 more comments

Graham Kemp · Accepted Answer · 2022-06-15 23:25:36Z

1

Take a population of size $52$, of which $13$ are of type heart, and $52-13$ are of type not-heart. Draw a sampling of size $5$.

$X_k$ is the indicator that the k-th sample is of type heart. $\mathsf P(X_k=1)=13/52$.

This marginal probability is true for each of the five samples.

Now, it is true that the samples are not independent. However, we are not measuring their joint probability, just the marginals - the probability that a sample is of type-heart without any information on the other samples -- so the dependency has no affect on our calculations.

This is why the Linearity of Expectation has such leverage that we use it often.

$$\mathsf E(X_1+X_2+X_3+X_4+X_5)~{=\mathsf E(X_1)+\mathsf E(X_2)+\mathsf E(X_3)+\mathsf E(X_4)+\mathsf E(X_5)\\=\dfrac{5\times 13}{52}}$$

answered Jun 15, 2022 at 23:25

Graham Kemp

130k7 gold badges54 silver badges124 bronze badges

$\begingroup$ I understand the rest of the part clearly. The only problem I am facing is understanding your second statement. Can you expand a bit on that? Can we use the basics to prove the second statement? For example: P(X2=1) = P(X1=0, X2=1) + P(X1=1, X2=1) using this eqn I can see that P(X2=1) is indeed 13/52 (i.e. 1/4) but can we compute the sum for P(Xk=1)? $\endgroup$
– dumbguywithmathsmajor
Commented Jun 16, 2022 at 4:22
$\begingroup$ Its just the Law of Total Probability $$\mathsf P(X_k=1)=\mathsf P(X_k=1 ,\bigcup_{i=0}^{n-1} \{(\sum_{j=1}^n X_j)-X_k = i\})$$ $\endgroup$
– Graham Kemp
Commented Jun 16, 2022 at 4:29
$\begingroup$ yes, I know that, so can we use the law of total probability to prove the constant probability of Xi's? $\endgroup$
– dumbguywithmathsmajor
Commented Jun 16, 2022 at 4:32
$\begingroup$ Well, we can, indeed. However, we just need to establish that each item from the population of $52$ has no bias for being selected on sample $k$, and $13$ among them are type-$\heartsuit$. $\endgroup$
– Graham Kemp
Commented Jun 16, 2022 at 4:40
$\begingroup$ Thanks for your answer! I understood the idea, and now your answer makes complete sense to me. $\endgroup$
– dumbguywithmathsmajor
Commented Jun 17, 2022 at 12:04

Add a comment |

dumbguywithmathsmajor · Accepted Answer · 2022-06-17 12:09:45Z

0

Reading this helped me understand the symmetry in sampling without replacement.

answered Jun 17, 2022 at 12:09

dumbguywithmathsmajor

5772 silver badges14 bronze badges

Add a comment |

Stack Exchange Network

Expected value of Hypergeometric distribution using indicator random variables

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
probability-distributions
random-variables
expected-value
.

Hot Network Questions

Expected value of Hypergeometric distribution using indicator random variables

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilityprobability-distributionsrandom-variablesexpected-value.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
probability-distributions
random-variables
expected-value
.