2
$\begingroup$

We all know the birthday problem.

Suppose an inversion* of the birthday problem: given a number of distinct birthdays (c) from a set of people, how can you estimate the number of people you had (n)? (given num birthdays < 365). And for arbitrary bin spaces other than 365 (k)?

E.g. you have c=47 days marked off in your school's calendar as birthdays (k=365), how big is your school most likely (n)?

Birthday collision problems are usually expressed as the odds of getting 1 or more collisions, I think we want the maximally likely quantity of collisions given c and k and add that to c to get a likely n.

(* not sure if this is the right term or name for this question)

$\endgroup$
2

2 Answers 2

1
$\begingroup$

Say there are n people in your school, and k days in the year (so the normal birthday problem has k = 365). Then the probability that any given day is not a birthday is $(1-1/k)^n$. If $k$ is large then this is approximately $e^{-n/k}$. Therefore we expect there to be $k e^{-n/k}$ days that are nobody's birthday.

So say we observe $j$ days which are somebody's birthday; then there are $k-j$ days which are nobody's birthday, and approximate number of people in the school can be found by solving $k - j = k e^{-n/k}$ for $n$. This gives

$$n \approx -k \log \left(1-{j \over k}\right)$$

So in the case you've suggested with $j = 47, k = 365$ we get

$$ n \approx -365 \log (1 - 47/365) \approx 50 $$.

$\endgroup$
3
  • $\begingroup$ does this problem or formula have a name? $\endgroup$
    – ʞɔıu
    Commented Jan 31, 2014 at 23:18
  • $\begingroup$ also, I'm assuming you're using natural log? $\endgroup$
    – ʞɔıu
    Commented Jan 31, 2014 at 23:20
  • $\begingroup$ Yes, it's natural log. I don't know a name for this problem, although that doesn't mean there isn't one. $\endgroup$ Commented Jan 31, 2014 at 23:22
1
$\begingroup$

This type of inversion probably lies behind other applications that make use of the concept of posterior probability. The approach mentioned below has been used in other areas such as data mining.

Let $D$ be a random variable denoting the number of distinct birthdays. For a given number of birthdays $d$, the desired answer is the value of $i$ that maximizes $\Pr[n = i \vert D=d]$. Also note that by the Bayes' theorem, $$\Pr[n = i \vert D=d] = \Pr[D=d \vert n = i]\cdot \frac{\Pr[n = i] }{\Pr[D=d]}$$ And we want to choose the value of $i$ that maximizes the above formula. Since $d$ is fixed, $\Pr[D=d]$ is fixed. Furthermore, unless we have some knowledge about the distribution of $n$, we assume that all the values are equally probable, i.e., $\Pr[n = i] $ is equal for all $i$. Then the problem reduces to finding the $i$ that maximizes $\Pr[D=d \vert n = i]$, which can be solved using the birthday problem. In case we have more knowledge about the distribution of $n$, we can substitute for $\Pr[n=i]$ in the above formula to apply this knowledge.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .