0
$\begingroup$

I'm teaching an undergrad class on survey sampling (not in the stats department) for the first time. This seems like a very basic question that I should know the answer to, but I can't find it anywhere here, nor have I found it in the various survey sampling books I have consulted. If we are using the CLT, a confidence interval will take the form:

$\bar{X} \pm z_{\alpha/2}\sqrt{var/n}$

When we are dealing with survey sampling, we are typically dealing with binomial data, where, say, a 1 means the respondent approves of Joe Biden's performance in office while a 0 means that the respondent disapproves. The variance for binomial data is $np(1-p)$

So why is it that the confidence intervals for a percentage (n>30) are

$\hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$

and not

$\hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})}$

The best answer I can come up with is that you're actually treating a dataset with 500 observations as 500 Bernoulli draws, which would make the variance $p(1-p)$ but just wanted to confirm before I corrupt some young minds.

$\endgroup$
2
  • 1
    $\begingroup$ $\hat{p}$ doesn't follow the binomial distribution you cited. Rather, that's the distribution of $\hat{x}:=\sum_{i=1}^nb_i; \,\, b_i \overset{\mathrm{iid}}{\sim} \mathrm{bernoulli}$, and it has the variance you cited. But $\hat{p}=\mathbb{V}[\frac{\hat{x}}{n}]=\frac{1}{n^2}\mathbb{V}[\hat{x}]$. $\endgroup$ Commented Sep 8, 2022 at 16:52
  • $\begingroup$ With your second formula, confidence would not vary at all with sample size, implying there would be no need to survey more than one person! $\endgroup$
    – whuber
    Commented Sep 8, 2022 at 17:24

0