Bounds on the conditional variance of a truncated binomial

Question

I have a binomial variable $R$ drawn from $binom(N, p)$, and I'm interested in the variance of $R$, given $R \ge Q$. The pmf of this variable $R^*$ is

$$ \phi_{R^*}(l) = \frac{\phi(l, N, p)}{P} $$

where $P= \sum_{l=Q}^G\phi(l, N, p)$ is the probability that $R \ge Q$, and $\phi$ is the binomial pmf.

Are there any bounds on the variance of $R^*$ compared to the variance of $R$, which is $Np(1-p)$? I'm sure it's less, but can we say how much less as a function of $Q$?

My ultimate goal is to prove that

$$ Var(R^*) \le \frac{N-ER^*}{N-ER}Var(R) $$

where $ER^*$ is the conditional mean of $R$ given that $R \ge Q$, and $ER = Np$ is the unconditional mean.

How did you arrive at this inequality, if you don't know if it is true or not ? — J. Delaney, Commented May 18, 2022 at 15:42
In the middle of a proof. I am rather sure it's true, just by plotting many graphs of it. — dash2, Commented May 18, 2022 at 19:02
Is Q a random variable or a constant? If it is a random variable, what is its distribution? — R Carnell, Commented May 22, 2022 at 22:25
I'd like to award my bounty to both answers but I see it has expired! I did not expect this. Can I do anything to award it? — dash2, Commented May 25, 2022 at 14:07

R Carnell · Accepted Answer · 2022-05-24 21:05:48Z

I need to change the notation a little to make it a little more consistent with stats references and easier to follow. I'll switch to the orginal poster's notation at the end.

$$X \sim binomial(n, p)$$

$$f(x,n,p) = {n \choose p} p^x (1-p)^{n-x}$$

$$F(x,n,p) = P(X \le x) = \sum_{k=0}^{x} f(k,n,p)$$

For the truncated distribution, you can find many references to show this:

$$f(x,n,p|a < X \le b) = \frac{g(x,n,p)}{F(b) - F(a)}$$

where $g(x,n,p) = f(x,n,p)$ if $a < x \le b$ and $g(x,n,p) = 0$ otherwise.

$$E(X | a < X \le b) = \sum_{a+1}^b xf(x,n,p|a<X\le b) \\ = \frac{1}{F(b,n,p)-F(a,n,p)} \sum_{a+1}^b x {n \choose p} p^x (1-p)^{n-x} \\ = \frac{np}{F(b,n,p)-F(a,n,p)} \sum_{a+1}^b \frac{(n-1)!}{(n-x)!(x-1)!} p^{x-1} (1-p)^{n-x} \\ = \frac{np}{F(b,n,p)-F(a,n,p)} \sum_{a}^{b-1} \frac{(m)!}{(m-y)!(y)!} p^{y} (1-p)^{m-y} \ \ where\ \ y=x-1\ \ and\ \ m=n-1 \\ = \frac{np}{F(b,n,p)-F(a,n,p)} [F(b-1,n-1,p) - F(a-1,n-1,p)]$$

$$E(X(X-1) | a<X\le b) = \sum_{a+1}^b x(x-1)f(x,n,p|a<X\le b) \\ = \frac{1}{F(b,n,p)-F(a,n,p)} \sum_{a+1}^b x(x-1) {n \choose p} p^x (1-p)^{n-x} \\ = \frac{n(n-1)p^2}{F(b,n,p)-F(a,n,p)} \sum_{a+1}^b \frac{(n-2)!}{(n-x)!(x-2)!} p^{x-2} (1-p)^{n-x} \\ = \frac{n(n-1)p^2}{F(b,n,p)-F(a,n,p)} \sum_{a-1}^{b-2} \frac{(m)!}{(m-y)!(y)!} p^{y} (1-p)^{m-y} \ \ where\ \ y=x-2\ \ and\ \ m=n-2 \\ = \frac{n(n-1)p^2}{F(b,n,p)-F(a,n,p)} [F(b-2,n-2,p) - F(a-2,n-2,p)]$$

$$V(X | a<X\le b) = \frac{n(n-1)p^2[F(b-2,n-2,p) - F(a-2,n-2,p)]}{F(b,n,p)-F(a,n,p)} + \frac{np[F(b-1,n-1,p) - F(a-1,n-1,p)]}{F(b,n,p)-F(a,n,p)} - \left[ \frac{np[F(b-1,n-1,p) - F(a-1,n-1,p)]}{F(b,n,p)-F(a,n,p)} \right]^2$$

Now, switching to the original notation and situation, I can get the ratio of variances as a function of $q$, but not quite the nice inequality you were looking for.

$$\frac{V(R|q-1<R\le n)}{V(R)} = \frac{1}{np(1-p)} \left( \frac{n(n-1)p^2[1 - F(q-3,n-2,p)]}{1-F(q-1,n,p)} + \frac{np[1 - F(q-2,n-1,p)]}{1-F(q-1,n,p)} - \left[ \frac{np[1 - F(q-2,n-1,p)]}{1-F(q-1,n,p)} \right]^2 \right)$$

The next step would be to show by a series arguments and operations to the right hand side, that

$$\frac{V(R|q-1<R\le n)}{V(R)} \le \frac{n-\frac{np(1-F(q-2,n-1,p))}{(1-F(q-1,n,p))}}{n-np} = \frac{N-ER^*}{N-ER}$$

Update on the Normal Approximation

Matt F's post using the normal approximation inspired me to continue, but I'm not sure that path reaches the ultimate goal of proving the inequality under all conditions.

See Wikipedia's Truncated Normal Distribution page for these facts

Let $\alpha = (a-\mu)/\sigma$ and $\beta = (b-\mu)/\sigma$ and $\phi$ is the standard normal PDF and $\Phi$ is the normal CDF.

$$E(X|a<X<b) = \mu - \sigma \frac{\phi(\beta)-\phi(\alpha)}{\Phi(\beta)-\Phi(\alpha)}$$

$$V(X|a<X<b) = \sigma^2 \left[ 1-\frac{\beta \phi(\beta)-\alpha \phi(\alpha)}{\Phi(\beta)-\Phi(\alpha)} - \left(\frac{\phi(\beta)- \phi(\alpha)}{\Phi(\beta)-\Phi(\alpha)}\right)^2\right]$$

Now, substituting in for this specific situation: with $W = 1 - \Phi(\alpha)$ and $\alpha = (q-np)/\sqrt{np(1-p)}$

$$\frac{V(R|q<R\le n)}{V(R)} = \frac{\sigma^2 \left[1 + \frac{\alpha \phi(\alpha)}{W} - \frac{\phi(\alpha)^2}{W^2}\right]}{\sigma^2} \le \frac{n-ER^*}{n-ER} = \frac{n-\left( np + \frac{\sqrt{np(1-p)} \phi(\alpha)}{W} \right)}{n - np}$$

$$ 1+\frac{(q-np)\phi(\alpha)}{W \sqrt{np(1-p)}} - \frac{\phi(\alpha)^2}{W^2} \le 1 - \frac{\phi(\alpha)\sqrt{np(1-p)}}{(n-np)W} $$

cancelling terms

$$\frac{q-np}{\sqrt{n(1-p)}} - \frac{\sqrt{p}\phi(\alpha)}{W} \le -\frac{p}{\sqrt{n(1-p)}}$$

To guarantee the inequality of interest:

$$q \le (n-1)p + \frac{\sqrt{np(1-p)}\ \ \phi\left( \frac{(q-np)}{\sqrt{np(1-p)}}\right) }{1-\Phi\left( \frac{(q-np)}{\sqrt{np(1-p)}} \right)}$$

Thank you for this. I think I can follow the proof. You're using facts about $n\choose p$ along the way, right? I haven't yet figured out if the right hand side at the end is related in any way to (N-ER*)/(N-ER), can you clarify? — dash2, Commented May 23, 2022 at 11:19
Yes. I used a change of variables about the {n \choose p}. I'll add that step. I know that the right hand side is not reducible to (N-ER*)/(N-ER), but that doesn't mean the inequality isn't possible. I can't yet find the right set of modifications that get you to the inequality you want. — R Carnell, Commented May 23, 2022 at 11:34
I get it. If you can do anything that helps to signpost the way forward, that is helpful! — dash2, Commented May 23, 2022 at 15:43
I was likewise motivated by your answer, wanting to avoid the complication of the binomial formulas in favor of an approximation that lets us solve for $N$ or $p$ in terms of the other two variables. (I use $q$ as one of those other variables, defining it as a transformation of the $Q$ in the question where you use them interchangeably; except for that notational difference, our answers about the normal seem to agree.) Even if the normal approximation doesn’t cover all cases, it might be enough to let the OP to make some progress on the proof, depending on the context. — Matt F., Commented May 25, 2022 at 0:33
Both answers are very good. I'm sorry the bounty expired - I'm not used to the system. I've added another bounty and will split it between you if I can. — dash2, Commented May 25, 2022 at 14:10

Matt F. · Accepted Answer · 2022-05-25 14:24:14Z

The conjecture is true for all sufficiently large $N$, namely when $N$ is large enough to use a normal approximation to the binomial, and $$N > \frac{\pi p}{2(1-p)}\left(\sqrt{\pi}q - \frac{\exp(-q^2)}{\text{erfc}(q)}\right)^{\!-2}$$ where

$q=(Q-Np)/\sqrt{2Np(1-p)}$, which is $1/\sqrt{2}$ times $Q$'s standard deviations above the mean, and
$\text{erfc}$ is the complementary error function.

The normal distribution to the binomial has mean $Np$ and variance $Np(1-p)$. Denoting its pdf by $f$, we can approximate \begin{align} ER^*&\simeq\frac{M_1}{M_0}\\ Var(R^*)&\simeq\frac{M_2}{M_0}-ER^{*2} \end{align} using the partial moments \begin{align} M_0=\int_Q^\infty f(x)dx = & \frac12\text{erfc}(q)\\ M_1=\int_Q^\infty xf(x)dx = & \frac{Np}{2}\text{erfc}(q) + \sqrt{\frac{Np(1-p)}{2\pi}}\exp(-q^2)\\ M_2=\int_Q^\infty x^2f(x)dx = & \frac{Np}{2}\text{erfc}(q)(1-p+Np)\\ & +\frac{Np\exp(-q^2)}{\sqrt{\pi}}(q-pq+\sqrt{2Np(1-p)}) \end{align} Under these approximations, the inequality in the question is equivalent to the first inequality above.

The function $\left(\sqrt{\pi}q - \frac{\exp(-q^2)}{\text{erfc}(q)}\right)^{\!-2}$ has the following graph:

which is asymptotic to $(4q^2+8)/\pi$ at $+\infty$.

Could you explain what you mean by the "first expression here"? — dash2, Commented May 25, 2022 at 14:13

Stack Exchange Network

Bounds on the conditional variance of a truncated binomial

2 Answers 2

Update on the Normal Approximation

Not the answer you're looking for? Browse other questions tagged
mathematical-statistics
variance
binomial-distribution
or ask your own question.

Hot Network Questions

Bounds on the conditional variance of a truncated binomial

2 Answers 2

Update on the Normal Approximation

Not the answer you're looking for? Browse other questions tagged mathematical-statisticsvariancebinomial-distribution or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
mathematical-statistics
variance
binomial-distribution
or ask your own question.