7
$\begingroup$

Suppose I have a set of bulbs, which are known to be healthy. For each bulb I have a value of its brightness. The underlying distribution is not necessarily normal, and possibly have some complex structure.

Now I have brightness values for another set of bulbs. In this set some bulbs have defects.

I need to calculate a lower bound on the number of bulbs with defects.

I developed a certain solution for this problem described below. My question is, if there are better/standard non-parametric methods for this problem? And whether my method is correct?


Method

  • event $A$ - observed bulb is intact
  • event $B=\bar{A}$ - observed bulb has some defect
  • $p_0=P(A)$ - probability of a bulb to be intact in the mixed set (fraction of intact bulbs in my mixture)
  • $\delta$ - some range of brightness values
  • $f$ - brightness
  • $p_\delta^A=P(f \in \delta \:\vert\: A)$ - probability that brightness of intact bulb falls in $\delta$
  • $p_\delta^B=P(f \in \delta \:\vert\: B)$ - probability that brightness of bulb with defect falls in $\delta$
  • $p_\delta=P(f \in \delta)$ - probability that brightness of a bulb from the mixed set falls in $\delta$

Lets calculate $p_0$: $$ p_\delta = P(f \in \delta)=P(A)P(f \in \delta \:\vert\: A) + P(B)P(f \in \delta \:\vert\: B) = p_0p_\delta^A+(1-p_0)p_\delta^B $$

$$ p_0 = 1 - \frac{p_\delta^A - p_\delta}{p_\delta^A - p_\delta^B} $$

The only unknown parameter in the final equation is $p_\delta^B$, and there is no way to calculate it, as the distribution of brightness values for bulbs with defects is unknown. To estimate the upper bound of $p_0$ we let $p_\delta^B = 0$, because it will give us the maximal possible value for $p_0$. Under this assumption equation can be simplified to:

$$ p_0 = \frac{p_\delta}{p_\delta^A} $$

To take into consideration uncertainty in calculation of $p_\delta^A$ and $p_\delta$ from finite samples, we draw values from their posterior probability distributions to build a posterior distribution for $p_0$.

$$ p_\delta \sim Beta(\frac{1}{2}+K_\delta, \frac{1}{2}+N_\delta-K_\delta) $$

$$ p_\delta^A \sim Beta(\frac{1}{2}+K_\delta^A, \frac{1}{2}+N_\delta^A-K_\delta^A) $$

Where $N_\delta$ is the total number of bulbs in the mixed set; $K_\delta$ - number of bulbs in the mixed set with brightness value from $\delta$; $N_\delta^A$ - total number of intact bulbs; $K_\delta^A$ - number of intact bulbs with brightness value from $\delta$.

After we built a distribution for $p_0$ we can choose an upper 99% quantile, and it will be the upper bound on the number of intact bulbs in the mixed set with p-value=0.01. By subtracting it from one we will obtain required value of upper bound of fraction of bulbs with defects.

Selection of $\delta$

As $\delta$ can be arbitrary, we used the following algorithm to select some value for it:

We choose $\delta$ to be in the following form $\delta = [x; +\infty]$. $x$ was chosen to give minimal value of max-likelihood estimation for $p_0$:

$$ \hat{p_0} = \frac{\hat{p_\delta}}{\hat{p_\delta^A}} = \frac{\frac{K_\delta}{N_\delta}}{\frac{K_\delta^A}{N_\delta^A}} $$

$\endgroup$

0

Browse other questions tagged or ask your own question.