0
$\begingroup$

Many of us are probably familiar with the birthday problem where you have however many people in a room and the probability that at least two people in the room share a birthday is some percentage between zero and one (and likely higher than expected if you are someone that isn't familiar with this problem).

I am trying to extend this into a scenario where I can find the expected value (and maybe even variance if I have time) of said experiment.

Here is the scenario:

I enter an empty room with one door leading into it. Then let's suppose we have an infinite amount of people with a random birth date (year is irrelevent). I ask one person to come in at a time and when each person enters the room, we check to see if there are any shared birthdays in the room. If so, then the experiment is over. If not, then the next person is brought in and the birthdays are checked again, until a success happens. What is the expected number of people that will be in the room (including myself) when a success happens?

Assumptions
  • The current year is a non-leap year (with only $365$ days, removing Feb $29$ possibility)
  • Each day of the year is equally likely to have been a possible birth date.
  • There aren't any sets of twins, triplets, etc.
In the link provided above, it says that $20$ is the answer to this kind of problem, but it seems like a different approach was used that what I am trying. I am hoping my method also gives the same answer.

My Attempt

Expected Value is generally calculated by $$E(X) =\sum_k{x_k \space \space {P(x_k)}}$$

So substituting in to fit this scenario, we have: $$E(X) =\sum_{k=1}^{366}{k \space \space {P(X=k)}}$$ with ${P(X=k)}$ in this case being the probability that a success happened when the $k^{th}$ person walked into the room (again, including myself).

We can deduct a formula for ${P(X=k)}$, which is $${P(X=k)} = \left(\prod_{i=1}^{k-1} \frac{{i!}{365 \choose i}}{365^i} \right) \times \left(1-\frac{{k!}{365 \choose k}}{365^k} \right) $$ supposing we let $k \ge 2$. We also know that $P(X=1) = 0$ and that we don't need to go beyond $366$ people, as this is the highest possible amount of people needed in the room to be able to get the success.

The Problem

I feel that my formula for ${P(X=k)}$ is overestimating slightly. With the particular problem being this part: $$\left(1-\frac{{k!}{365 \choose k}}{365^k} \right)$$ This is the typical formula you may see for the birthday problem, but it is making a wrong assumption in that this formula is considering the possibility that more than two people could be sharing a birthday (or that everyone is sharing the same birthday!). In this fictional scenario, that isn't possible. There would be two and only two people with the same birthday in the room. Although I feel like this answer and the real answer may round to the same integer anyways $\left(20\right)$, I would like to be as precise as possible. This question tries to tackle this, but with a fixed group of $23$ people instead of a generalized solution.

There is also the problem of attempting to solve for $E(X)$ by machine (I tried what I have so far with a TI-89 but it couldn't calculate all this without freezing or giving an overload error), so I will need to try putting this into Mathematica later, but figuring out how to do all that is something I can figure out on my own.

So what I need help on is this part: when generalizing for a group of $k$ people, what is the probability that exactly 2 people share a birthday? (the second part of the formula for finding $P(X=k)$. )

Last second thoughts before posting

Doing some scratch work on paper my guess is that the true formula for $P(X=k)$ is $${P(X=k)} = \left(\prod_{i=1}^{k-1} \frac{{i!}{365 \choose i}}{365^i} \right) \times {\left( k-1 \right)}\space \frac{365!}{\left( 365-k+1\right)!} \left( \frac{1}{365} \right)^k $$ which came to me using some inspiration from this post.

$\endgroup$
1
  • $\begingroup$ It may be easier to use $E(X)=\sum_{k=1}^{366}P(X\ge k)$. $\endgroup$
    – Jason
    Commented Jul 17, 2017 at 4:41

1 Answer 1

2
$\begingroup$

Observe that $P(X\ge k)$ is much simpler to calculate: it is merely the probability that in a group of $k-1$ people, no two share a birthday. Thus

$$P(X\ge k)=1\cdot\frac{365-1}{365}\cdot\cdots\cdot\frac{365-(k-2)}{365}=\prod_{n=0}^{k-2}\left(1-\frac n{365}\right)$$

for $k\ge2$. Using the alternative formula for expected value of a random variable which takes only positive integer values, we find

$$E(X)=\sum_{k=1}^\infty P(X\ge k)=1+\sum_{k=2}^{366}\prod_{n=0}^{k-2}\left(1-\frac n{365}\right).$$

WolframAlpha calculates this to be approximately $24.62$.

$\endgroup$
6
  • $\begingroup$ I was not aware of this shortcut formula for expected value . Thank you for sharing this. I am curious as to why this answer differs a little from $20$. Seeing as to how this method is calculating it pretty much with brute force, I'm more inclined to say this answer is correct. $\endgroup$
    – WaveX
    Commented Jul 17, 2017 at 5:22
  • $\begingroup$ Unless the $20$ isn't for expected value but the most likely value, and the $24.62$ is the expected value I suppose. Perhaps I misinterpreted that part lol. $\endgroup$
    – WaveX
    Commented Jul 17, 2017 at 5:30
  • $\begingroup$ I think maybe you're conflating an approximate explanation of the birthday paradox ("did you know that if you have around $20$ people in a room, there's more than a $50\%$ chance that two share a birthday?") with the actual "most likely" outcome. If you have $23$ or more people in a room, there is a greater than $50\%$ chance that two or more share a birthday, and if you have $22$ or less, you have a less than $50\%$ chance. However, this doesn't make $23$ the "most likely" number when a success happens in your case. [Continued] $\endgroup$
    – Jason
    Commented Jul 17, 2017 at 15:02
  • $\begingroup$ For example, what if $P(X\le22)=0.499$ and $P(X=23)=0.002$? If this were the case, $23$ would still be the "tipping point", but $\{X=23\}$ itself is very unlikely. This isn't what actually happens, of course, but it does give you an indication to be a little careful. Basically what we're looking at is the difference between mean, median and mode. In a symmetric distribution, the mean, median and mode all coincide, but this distribution clearly is not symmetric. $\endgroup$
    – Jason
    Commented Jul 17, 2017 at 15:07
  • $\begingroup$ I understand. Your answer shed some light on this explanation as well. As for the mean, median, and mode of the scenario, the computer answer of $24.62$ would be the mean and would $20$ be the mode? (I know there wasn't any calculations for the mode, but I guess it can be found by maximizing $P(X=k)$? $\endgroup$
    – WaveX
    Commented Jul 17, 2017 at 16:50

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .