1
$\begingroup$

Suppose there's a set of 500 individuals. What is the expected number of coinciding birthday celebrators? That's easy - it's $500/365$, under the assumption of uniform distribution.

But a birthday can't be fractioned between different dates so of practical reasons, some days will have more and others less celebrators. I wonder what's the expected number of simultaneous birth dates on the most common day (i.e. the one with most people having been born).

My impression is that there's something I'm missing and that it's not as easy as the formula in the first paragraph. The reasoning is that if there's only 5 individuals, the expected number of celebrators on the most common day is not $5/365$ but only $1/365$ because it's most probable that the birthdays are spread across the year.

So, the suggested solution would be removing the first $365$ people, regarding them as likely having one day each. Then, keeping the remaining $135$, hence arriving at $1 + 135/365$.

But I'm so confused and uncertain that I need outside input.

edit based on comments

There's certainly at least two simultaneous celebrators on the most common day. So the expected number is $2 + something$. That something consists of the likelihood for each of the remaining $500 - 2$ individuals.

If there's $2$ people in the group, the expected value is $1 + 1/365$. If there are $3$ people in the group, the expected value is $1 + 2 * 1 / 365$. Is that correct?

Then, if we'd have $365$ people, the formula suggested above would render $1 + 365 * 1 / 365$, which is $2$. It seems intuitively correct to me. Am I missing something?

Finally, the original group size was $500$. Hence, the answer would be $2 + 135 * 1 / 365$ or, if you want, $1 + 500 * 1 / 365$. Can anybody shoot it down somehow?

$\endgroup$
15
  • $\begingroup$ With any (non-zero) number of people, the number of simultaneous birthdays on the most common day is at least $1$; so the expected value of this statistic is also at least $1$, and not $1/365$ as you claim. In fact it is a little more than $1$, because the probability that two or more of the five have the same birthday is not zero. $\endgroup$
    – TonyK
    Commented Feb 12, 2017 at 19:40
  • $\begingroup$ @TonyK I just verified with the source of my question and admittedly, you're right. In the light of your remark, what's the appropriate approach, then? I've got a suggestion in 5 minutes. Please take a look at the edit. $\endgroup$ Commented Feb 12, 2017 at 20:39
  • $\begingroup$ Your intuition is way out, i'm afraid. With 365 people, the chance is infinitesimal that they all have different birthdays, so the expected value will be at least 2; in fact, significantly more than 2, given that it is very likely that we see three or more people with the same birthday. This question is hard, unfortunately. $\endgroup$
    – TonyK
    Commented Feb 12, 2017 at 20:54
  • 1
    $\begingroup$ Check the Birthday problem $\endgroup$
    – Garmekain
    Commented Feb 12, 2017 at 20:55
  • 1
    $\begingroup$ OK, I've done that: one million simulations. The average value of the max number of people sharing a birthday is... $4.967$. So even higher than I thought. $\endgroup$
    – TonyK
    Commented Feb 12, 2017 at 21:40

1 Answer 1

1
$\begingroup$

Your comment asked for an answer:

The following R code allows a simulation

maxhits <- function(x){ max(table(x)) }
meanmax <- function(people, days=365, cases=10000){
  eg <- matrix(sample(days, people*cases, replace = TRUE), nrow=cases)
  maxfreq <- table(apply(eg, 1, maxhits) )
  sum(as.numeric(names(maxfreq)) * maxfreq / cases)
  }

such as

> set.seed(1)
> meanmax(365)
[1] 4.9799
> meanmax(400)
[1] 5.2131
> meanmax(500)
[1] 5.8776
> meanmax(1000)
[1] 8.7057

though with 10,000 simulated cases I would not trust these results past the first decimal place

Added: This chart show simulated means of the maximum for up to $1200$ people

enter image description here

and rather strangely I managed empirically to get something which looks like a reasonable approximation to the mean of the maximum rounded to integers on this range with something like qpois(0.9984, lambda=(1:1200)/365) as a quantile of the Poisson distribution, illustrated in red. I would expect there to be a continuous function which did even better, though the first $40$ or so values do not appear to be on the same smooth curve as the rest of them

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .