2
$\begingroup$

We learned about the birthday problem paradox: If people are in a room with randomly distributed birthdays, very few people are needed for at least two people to have the same birthday.

As I understand, the formula just uses logic.

Let $Q(n)$ be the probability that $n$ people all have different birthdays ($n+1$ is used as a correction factor to avoid double counting)

$$ Q(n) = \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \ldots \times \frac{365-n+1}{365} $$

And since $P(n) = 1 - Q(n)$, the probability that at least two people have the same birthday is:

$$ P(n) = 1 - Q(n) = 1 - \left( \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \ldots \times \frac{365-n+1}{365} \right) $$

Here is a simulation that shows the minimum number of people with random birthdays required to have at least one common birthday:

library(ggplot2)
num_simulations <- 1000
num_days <- 365
results <- numeric(num_simulations)

for (i in 1:num_simulations) {
    birthdays <- integer(0)
    while (TRUE) {
        new_birthday <- sample(1:num_days, 1)
        if (new_birthday %in% birthdays) {
            results[i] <- length(birthdays) + 1
            break
        } else {
            birthdays <- c(birthdays, new_birthday)
        }
    }
}

mean_trials <- mean(results)

ggplot(data.frame(Trials=results), aes(Trials)) +
    geom_histogram(binwidth=1, color="black", fill="lightblue") +
    ggtitle(paste("Distribution of Trials Until Match (Mean =", round(mean_trials, 2), ")")) +
    xlab("Number of Trials Until Match") +
    ylab("Frequency") +
    theme_minimal()

enter image description here

And here is a simulation which compares simulated results to the theoretical results over different numbers of people (we can see that the simulations are identical to the theoretical number):

birthday_probability <- function(n) {
  q <- 1
  for (i in 1:n) {
    q <- q * (365 - i + 1) / 365
  }
  return(1 - q)
}

simulate_and_percentage <- function(n, num_simulations) {
  num_days <- 365
  repeat_count <- 0
  for (i in 1:num_simulations) {
    birthdays <- sample(1:num_days, n, replace = TRUE)
    if (length(birthdays) != length(unique(birthdays))) {
      repeat_count <- repeat_count + 1
    }
  }
  return(repeat_count / num_simulations)
}


n_values <- 2:100
num_simulations <- 1000

theoretical_probabilities <- sapply(n_values, birthday_probability)

simulation_percentages <- sapply(n_values, simulate_and_percentage, num_simulations = num_simulations)

data <- data.frame(
  n_values = rep(n_values, 2),
  probabilities = c(theoretical_probabilities, simulation_percentages),
  type = rep(c("Theoretical", "Simulation"), each = length(n_values))
)


ggplot(data, aes(x = n_values, y = probabilities, color = type)) +
  geom_line() +
  ggtitle("Comparison of Theoretical vs Simulation Probabilities") +
  xlab("Number of People (n)") +
  ylab("Probability") +
  ylim(0, 1) +  # Ensure y-axis is limited between 0 and 1
  theme_minimal()

enter image description here

I am wondering if its possible to slightly modify the objective for this problem. For example, if there are $n$ people:

  • What is the probability of at least $k$ groups of people having the same the birthdays and how many people are likely to be in these groups?

For example: Suppose there are $n = 12$ people with the following birthdays:

  • 1 person with Jan 4th
  • 2 people with Feb 6th
  • 2 people with March 8th
  • 3 people with April 2nd
  • 1 person with May 7th
  • 1 person with June 15th
  • 2 people with July 1st

Thus we have:

  • 3 groups of size = 1
  • 3 groups of size = 2
  • 1 group of size = 3

What is the probability of observing this specific sequence? Or as another example, what is the probability of observing 1 group of size=9, 1 group of size=2 and 1 group of size =1?

Can we change the birthday problem formula for this?

$\endgroup$

2 Answers 2

1
$\begingroup$

We can use the multinomial distribution, thus for the problem with $12$ people with the particular birthdays as given, we have to imagine that there are $365$ slots filled in the manner given.

$$Pr = \dfrac{\dbinom{12}{1,2,2,3,1,1,2}}{365^{12}}$$

"Or another example" becomes a different problem because you are not putting the people into particular boxes, so taking the first example of grouping, viz

3 groups of size = 1

3 groups of size = 2

1 group of size = 3

You could place the groups and $\mathtt{permute}$ them to get the number of cases

$$\binom{12}{1,1,1,2,2,2,3}\times \frac{365!}{3!3!1!358!}$$

The denominator, of course would remain $365^{12}$

but such an extension of the original problem does not seem to serve much purpose

$\endgroup$
0
1
$\begingroup$

Suppose you have $d$ days in the year and for each non-negative integer $i$ you have $k_i$ days with $i$ people having birthdays that day, so $\sum\limits_{i=0}^d k_i = d$ while $\sum\limits_{i=0}^d i\,k_i=n$ is the total number of people, then the probability is $$\frac{d!\, n!}{d^n\,\prod\limits_{i=0}^n k_i! \,\prod\limits_{i=0}^n (i!)^{k_i}}.$$

With $d=365$ and your example of $n=12$ with $k_1=3, k_2=3,k_3=1$ giving $k_0=358$, after cancelling the $358!$ you get $\frac{365\times 364 \times 363 \times 362 \times 361 \times 360 \times 359 \times 12!}{365^{12} \times (3! \times 3! \times 1!) \times (1!^3 \times 2!^3 \times 3!^1)}$ which is about $4.04 \times 10^{-8}$.

The calculation in R:

probdist <- function(dist, days=365){
  people <- sum(dist * (1:length(dist)))
  remainingdays <- days - sum(dist)
  exp( lfactorial(days) + lfactorial(people) - people*log(days) - 
       sum( lfactorial(dist) + dist*lfactorial(1:length(dist)) ) - 
       lfactorial(remainingdays) ) 
  } 

probdist(c(3, 3, 1))
# 4.038243e-08

Your other example of $n=12$ with $k_1=1, k_2=1, k_9=1$ giving $k_0=362$ is clearly going to be very much less likely $\frac{365\times 364 \times 363 \times 12!}{365^{12} \times (1! \times 1! \times 1!) \times (1!^1 \times 2!^1 \times 9!^1)}$ which is about $5.69 \times 10^{-21}$

probdist(c(1, 1, 0, 0, 0, 0, 0, 0, 1))
# 5.692859e-21
 
$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .