2
$\begingroup$

The problem statement is simple: Let there be a group of n people. What is the probability that exactly t pairs will share the same birthday?

All my attempts to solve this problem came down to the following steps:

  1. Consider the null graph where each vertex is unique. Vertex - person, edge - sharing same birthday.

  2. Add edges and reduce graph into a set of complete graphs such that the total number of edges equals t. If k people share the the same birthday, there will be $\frac{k*(k-1)}{2}$ pairs, and if we draw the picture of graph, this group of k people will look like a complete graph with k vertices. See linked pictures.

  3. For each following graph, find permutations of possible birthdays.

Example for n = 6, t = 3:

In this case, we can divide the group into 3 pairs of 2 people each (resulting in 3 different birthdays). Alternatively, we can take a group of three people, while others will have paired different birthdays.

Example of pairs: 3 pairs of 2 people and 1 group of 3 people

For the following case it's easy to count that the number of 3 different pairs is ${6 \choose 2} * {4 \choose 2}$ and to choose 3 people ${6 \choose 3}$

So, the number of birthdays that fit the first case: ${365 \choose 3} * {6 \choose 2} * {4 \choose 2}$ and for the second case ${365 \choose 1} * {6 \choose 3} * 364 * 363$

Then the probability is:

$$\frac{{365 \choose 3} * {6 \choose 2} * {4 \choose 2} + {365 \choose 1} * {6 \choose 3} * 364 * 363}{365^6}$$

So, here I struggle with many questions:

  1. How to find all possible partitions of graph for n people and t pairs?
  2. How to sum the matching cases taking into account binomial coefficients?
  3. Maybe there is another way to solve this problem?
$\endgroup$
0

2 Answers 2

0
$\begingroup$

I would have read the question as involving involves $t$ days each with exactly $2$ people having that that day as a birthday and $n-2t$ days each with exactly $1$ person having that that day as a birthday. You have read it as $t$ of the ${n \choose 2}$ pairs of people having the $2$ people in the pair having the same birthday and is more difficult for $t>2$.

For my interpretation, there are ${365 \choose t}{365-t \choose n-2t}$ ways of choosing the days and $\frac{n!}{2^t}$ ways of choosing the people to distribute among those days to meet the condition, out of a possible $365^n$ possible allocations, so an overall probability of $\dfrac{365!\, n! }{365^n \,t! \, (n-2t)!\,(365+t-n)!\,2^t }.$

For your interpretation, you want to consider all possible $(k_0,k_1,k_2,\ldots, k_n)$ where $\sum\limits_{i:i\ge0} k_i = 365$ and $\sum\limits_{i:i\ge1} ik_i = n$ and $\sum\limits_{i:i\ge2}{i \choose 2}k_i =t$, and the overall probability $$\frac{365!\, n! }{365^n}\sum\limits_{\{(k_0,k_1,k_2,\ldots, k_n)\}}1/\prod\limits_{i=0}^n k_i!\, (i!)^{k_i}.$$

Your expression for $n=6,t=3$ has a slight error as it is missing a $362$ term: $\frac{{365 \choose 3} \times {6 \choose 2} \times {4 \choose 2} + {365 \choose 1} \times {6 \choose 3} \times 364 \times 363 \times 362}{365^6}$ would give the correct probability.

Finding the possible $(k_0,k_1,k_2,\ldots, k_n)$ to sum over is quite complicated if $t$ is large, though note that $k_i=0$ for any $i \gt \sqrt{2t+\frac14}+\frac12$ and it would be possible to do this systematically. For small $t$ it is easy enough manually: in your example with $n=6, t=3$, the only possibilities are $(362,0,3,0,0,0,\cdots)$ and $(361,3,0,1,0,0,\cdots)$.


It is easy to find the total expected number of pairs in your interpretation for $n$ people: it is $\frac{n(n-1)}{2\times 365}$.

One approach to approximating the probabilities, at least for small to medium sized numbers of people, and numbers of pairs for which the probabilities are not too small, would be to use simulation, accepting some simulation noise. For example using R, with $n=10$, you could try something like

simpairs <- function(people, days=365){
 s <- sample(days, people, replace=TRUE)
 (sum(outer(s, s, "==")) - people)/2 
 }

set.seed(2024)
pairsinsims <- replicate(10^6, simpairs(6))
table(pairsinsims) / 10^6
# pairsinsims
#        0        1        2        3        4 
# 0.959351 0.040183 0.000324 0.000141 0.000001     
mean(pairsinsims)
# 0.041258

and that simulated mean compares quite well with the theoretical expectation of about $0.041096$.

The alternative is to work out all the possible patterns. I think this code does that, though no doubt there are better ways of doing so. Start with some preliminary simple functions

nc2 <- function(x){
  x * (x-1) / 2
  }

invnc2 <- function(y){
  (1 + sqrt(1+8*y)) / 2
  }

maxlevel <- function(y){
  floor(invnc2(y))
  }

daysindist <- function(dist){
  sum(dist)
  }
  
pairsindist <- function(dist){
  sum(nc2(1:length(dist)) * dist)
  }

peopleindist <- function(dist){
  sum(dist * (1:length(dist)))
  }

 

minmaxatlevel <- function(dist, level, totalpeople, totalpairs, days=365){
  peopleleft <- totalpeople - peopleindist(dist)
  pairsleft  <- totalpairs - pairsindist(dist)
  minatlevel <- max(0, 
                ceiling(2*(pairsleft + peopleleft)/level  - peopleleft))
  maxatlevel <- min(floor(peopleleft/level), days - daysindist(dist), 
                ifelse(level == 1, minatlevel, 
                       floor(pairsleft / nc2(level))))  
  c(minatlevel, maxatlevel)
  }

 

distmatrix <- function(totalpeople, totalpairs, days=365){
   width <- maxlevel(totalpairs)
   mat <- matrix(0, nrow=1, ncol=width)
   level <- width
   while (level > 0){
     for (r in 1:nrow(mat)){
       rangetoadd <- minmaxatlevel(mat[1,], level, 
                                   totalpeople, totalpairs, days)
       if (rangetoadd[1] <= rangetoadd[2]){
         for (add in rangetoadd[1]:rangetoadd[2]){
           mat <- rbind(mat,  
                  mat[1,] + c(rep(0, level-1), add, rep(0, width-level)))
           }
         }
       mat <- matrix(mat[-1,], ncol=width)             
       }
     if (nrow(mat) == 0){
       return(mat)
       }  
     level <- level-1
     }
   mat
   }

 

probdist <- function(dist, days=365){
  people <- peopleindist(dist) 
  remainingdays <- days - sum(dist)
  exp( lfactorial(days) + lfactorial(people) - people*log(days) - 
       sum( lfactorial(dist) + dist*lfactorial(1:length(dist)) ) - 
       lfactorial(remainingdays) ) 
  }

 

probability <- function(people, pairs, days=365){
  distributionmatrix <- distmatrix(people, pairs, days)
  if (nrow(distributionmatrix) == 0){
    return(0)
    }
  cumprob <- 0
  for (r in 1:nrow(distributionmatrix)){
    cumprob <- cumprob + probdist(distributionmatrix[r,], days)
    }
  cumprob
  }

which for $n=6$ and various numbers of pairs would give

probability(6, 0)
# 0.9595375
probability(6, 1)
# 0.03998073
probability(6, 2)
# 0.0003322498
probability(6, 3)
# 0.0001479725
probability(6, 4)
# 1.223756e-06
probability(6, 5)
# 0
probability(6, 6)
# 3.065009e-07
probability(6, 7)
# 8.428074e-10
probability(6, 8)
# 0
probability(6, 9)
# 0
probability(6,10)
# 3.371229e-10
probability(6,15)
# 1.543603e-13

summing to $1$. Allowing for noise, the simulations are close to this.

It works for larger numbers, but gets slower as the desired number of pairs increases: for example for $n=275$ people and looking for $100$ total pairs, it will find the $8676$ possible distributions giving that number of pairs and sum their probabilities:

probability(275, 100)
# 0.03892569
$\endgroup$
0
$\begingroup$

I'm assuming that all the $t$ pairs have different birthdays

For making $t$ pairs, we need to select $2t$ from $n$ (in $C(n,2t)$ ways)

Now, divide this group of $2t$ people in $t$ groups of group size $2$ each in $\frac{(2t)!} {(2!)^t×t!}$ ways

Now we have $t$ pairs and $n-2t$ other people (none of which share the same birthday). So it results in $n-t$ different birthdays

They can select their birthdays in $365×364×363×...×(365-(n-t-1)) = \frac{365!} {(365-(n-t))!}$ ways

So required probability = $\frac{C(n,2t)×\frac{(2t)!}{(2!)^t×t!}× \frac{365!}{(365-(n-t))!}}{365^n}$

$\endgroup$
2
  • 1
    $\begingroup$ So what is your answer to the $n = 6$ people and $t = 3$ pairs example? I think it should be either about $0.0000003059$ or about $0.0001479725$ depending on how you interpret "$3$ pairs". $\endgroup$
    – Henry
    Commented Apr 13 at 22:04
  • $\begingroup$ From the $1st$ interpretation, I'm getting the same answer as yours i.e. $0.0000003059$. But I'm not getting the other from the $2nd$ interpretation. I'll delete it till I figure out the correct way for this case. I got the mistake, thanks for pointing it out $\endgroup$
    – Fredrick
    Commented Apr 14 at 3:15

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .