0
$\begingroup$

So I'm a bit stuck on deriving the inclusion probability for a random stratified sample.

The question's context is as follows: say we wish to understand household income, but a complete list of households is unavailable. However, we do have a register of the total population. We select a probability sample of all $N$ individuals and the households to which the selected individuals belong are chosen and evaluated. If a household has $M$ people, and we wish to select a sample of $n$ individuals from $N$ total people, what's the probability that a household with M people is selected?

In my head, I define the house with $M$ people as house $h_M$ and the sample as $s$. I know that I want to find $$\Pr(\text{$h_M \in s$}).$$ Since we aren't sampling houses, just people, I reason that in order for house $h_M$ to be in the sample, two things have to happen: (a) an individual $i$ has to be chosen and (b) that individual has to belong to the house. I rewrite this as $$(i \in s) \cap (i \in h_M)$$ and so the problem now becomes \begin{align} \Pr(h_M \in s) &= \Pr\{(i \in s) \cap (i \in h_M)\}. \end{align} Using Bayes' Theorem, \begin{align} \Pr(A|B) &= \frac{\Pr(A \cap B)}{\Pr(B)} \\ \Pr(A|B)\Pr(B) &= \Pr(A \cap B) \\ \color{blue}{\Pr\{(i \in s)\ | \ (i \in h_M)\}} \ \times \ \color{green}{\Pr(i \in h_M)} &= \Pr\{(i \in s) \cap (i \in h_M)\}. \end{align}

For $\color{blue}{\text{blue}}$, wouldn't this be \begin{align}\ \Pr\{(i \in s)\ | \ (i \in h_M)\} &= \frac{\binom{M-1}{0}}{\binom{M}{1}} \\ &= \frac{1}{M}? \end{align}

And for $\color{green}{\text{green}}$, would this be \begin{align} \Pr(i \in h_M) &= \frac{\binom{N-1}{M-1}\times \binom{M-1}{0}}{\binom{N}{M} \times \binom{M}{1}} \\ &= \frac{1}{N}? \end{align} I'm kind of grasping at straws at this point.

$\endgroup$

1 Answer 1

1
$\begingroup$

Consider choosing the $n$ samples sequentially. The probability that the first sample comes from $h_M$ is $M/N$. If the first sample is from $h_M$ then we are done. If not, then there are $N - 1$ people left to sample so the probability that the second sample comes from $h_M$ is $M/(N-1)$. This continues until we have sampled $n$ people.

The probability that no one from $h_M$ is sampled is the probability that the first sample does not come from $h_M$, the second sample does not come from $h_M$, and so on. Therefore someone from $h_M$ is sampled with probability $$Pr(h_M \in s) = 1 - Pr(h_M \notin s) = 1 - \left(\frac{N - M}{N}\right)\left(\frac{N - 1 - M}{N - 1}\right)\cdots\left(\frac{N - (n - 1) - M}{N - (n - 1)}\right).$$

As an edge case we can see that once $n \geq N - M + 1$, $Pr(h_M \in s) = 1$.

$\endgroup$
1
  • 1
    $\begingroup$ This makes sense, thank you! :) $\endgroup$
    – JerBear
    Commented Sep 9, 2022 at 19:59

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .