Modifying the Birthday Problem Paradox for Arbitrary Situations?

Question

We learned about the birthday problem paradox: If people are in a room with randomly distributed birthdays, very few people are needed for at least two people to have the same birthday.

As I understand, the formula just uses logic.

Let $Q(n)$ be the probability that $n$ people all have different birthdays ($n+1$ is used as a correction factor to avoid double counting)

$$ Q(n) = \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \ldots \times \frac{365-n+1}{365} $$

And since $P(n) = 1 - Q(n)$, the probability that at least two people have the same birthday is:

$$ P(n) = 1 - Q(n) = 1 - \left( \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \ldots \times \frac{365-n+1}{365} \right) $$

Here is a simulation that shows the minimum number of people with random birthdays required to have at least one common birthday:

library(ggplot2)
num_simulations <- 1000
num_days <- 365
results <- numeric(num_simulations)

for (i in 1:num_simulations) {
    birthdays <- integer(0)
    while (TRUE) {
        new_birthday <- sample(1:num_days, 1)
        if (new_birthday %in% birthdays) {
            results[i] <- length(birthdays) + 1
            break
        } else {
            birthdays <- c(birthdays, new_birthday)
        }
    }
}

mean_trials <- mean(results)

ggplot(data.frame(Trials=results), aes(Trials)) +
    geom_histogram(binwidth=1, color="black", fill="lightblue") +
    ggtitle(paste("Distribution of Trials Until Match (Mean =", round(mean_trials, 2), ")")) +
    xlab("Number of Trials Until Match") +
    ylab("Frequency") +
    theme_minimal()

And here is a simulation which compares simulated results to the theoretical results over different numbers of people (we can see that the simulations are identical to the theoretical number):

birthday_probability <- function(n) {
  q <- 1
  for (i in 1:n) {
    q <- q * (365 - i + 1) / 365
  }
  return(1 - q)
}

simulate_and_percentage <- function(n, num_simulations) {
  num_days <- 365
  repeat_count <- 0
  for (i in 1:num_simulations) {
    birthdays <- sample(1:num_days, n, replace = TRUE)
    if (length(birthdays) != length(unique(birthdays))) {
      repeat_count <- repeat_count + 1
    }
  }
  return(repeat_count / num_simulations)
}


n_values <- 2:100
num_simulations <- 1000

theoretical_probabilities <- sapply(n_values, birthday_probability)

simulation_percentages <- sapply(n_values, simulate_and_percentage, num_simulations = num_simulations)

data <- data.frame(
  n_values = rep(n_values, 2),
  probabilities = c(theoretical_probabilities, simulation_percentages),
  type = rep(c("Theoretical", "Simulation"), each = length(n_values))
)


ggplot(data, aes(x = n_values, y = probabilities, color = type)) +
  geom_line() +
  ggtitle("Comparison of Theoretical vs Simulation Probabilities") +
  xlab("Number of People (n)") +
  ylab("Probability") +
  ylim(0, 1) +  # Ensure y-axis is limited between 0 and 1
  theme_minimal()

I am wondering if its possible to slightly modify the objective for this problem. For example, if there are $n$ people:

What is the probability of at least $k$ groups of people having the same the birthdays and how many people are likely to be in these groups?

For example: Suppose there are $n = 12$ people with the following birthdays:

1 person with Jan 4th
2 people with Feb 6th
2 people with March 8th
3 people with April 2nd
1 person with May 7th
1 person with June 15th
2 people with July 1st

Thus we have:

3 groups of size = 1
3 groups of size = 2
1 group of size = 3

What is the probability of observing this specific sequence? Or as another example, what is the probability of observing 1 group of size=9, 1 group of size=2 and 1 group of size =1?

Can we change the birthday problem formula for this?

true blue anil · Accepted Answer · 2024-04-16 21:25:54Z

We can use the multinomial distribution, thus for the problem with $12$ people with the particular birthdays as given, we have to imagine that there are $365$ slots filled in the manner given.

$$Pr = \dfrac{\dbinom{12}{1,2,2,3,1,1,2}}{365^{12}}$$

"Or another example" becomes a different problem because you are not putting the people into particular boxes, so taking the first example of grouping, viz

3 groups of size = 1

3 groups of size = 2

1 group of size = 3

You could place the groups and $\mathtt{permute}$ them to get the number of cases

$$\binom{12}{1,1,1,2,2,2,3}\times \frac{365!}{3!3!1!358!}$$

The denominator, of course would remain $365^{12}$

but such an extension of the original problem does not seem to serve much purpose

Henry · Accepted Answer · 2024-04-20 10:37:01Z

Suppose you have $d$ days in the year and for each non-negative integer $i$ you have $k_i$ days with $i$ people having birthdays that day, so $\sum\limits_{i=0}^d k_i = d$ while $\sum\limits_{i=0}^d i\,k_i=n$ is the total number of people, then the probability is $$\frac{d!\, n!}{d^n\,\prod\limits_{i=0}^n k_i! \,\prod\limits_{i=0}^n (i!)^{k_i}}.$$

With $d=365$ and your example of $n=12$ with $k_1=3, k_2=3,k_3=1$ giving $k_0=358$, after cancelling the $358!$ you get $\frac{365\times 364 \times 363 \times 362 \times 361 \times 360 \times 359 \times 12!}{365^{12} \times (3! \times 3! \times 1!) \times (1!^3 \times 2!^3 \times 3!^1)}$ which is about $4.04 \times 10^{-8}$.

The calculation in R:

probdist <- function(dist, days=365){
  people <- sum(dist * (1:length(dist)))
  remainingdays <- days - sum(dist)
  exp( lfactorial(days) + lfactorial(people) - people*log(days) - 
       sum( lfactorial(dist) + dist*lfactorial(1:length(dist)) ) - 
       lfactorial(remainingdays) ) 
  } 

probdist(c(3, 3, 1))
# 4.038243e-08

Your other example of $n=12$ with $k_1=1, k_2=1, k_9=1$ giving $k_0=362$ is clearly going to be very much less likely $\frac{365\times 364 \times 363 \times 12!}{365^{12} \times (1! \times 1! \times 1!) \times (1!^1 \times 2!^1 \times 9!^1)}$ which is about $5.69 \times 10^{-21}$

probdist(c(1, 1, 0, 0, 0, 0, 0, 0, 1))
# 5.692859e-21

Stack Exchange Network

Modifying the Birthday Problem Paradox for Arbitrary Situations?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
birthday
.

Hot Network Questions

Modifying the Birthday Problem Paradox for Arbitrary Situations?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilitybirthday.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
birthday
.