Skip to main content
added 6 characters in body
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} \approx \boxed{ 3 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} \approx \boxed{ 9 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} \approx \boxed{ 3 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} \approx \boxed{ 9 \times 10^{-22} }.$$

deleted 4 characters in body
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday $i$ is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday $i$ is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

deleted 208 characters in body
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday $i$ is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$. (The Poisson approximation isn't likely to be accurate enough that even the order of magnitude of this estimate should be trusted, but it should at least accurately tell us when the answer is big vs. small.)

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. ThisIn any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday $i$ is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$. (The Poisson approximation isn't likely to be accurate enough that even the order of magnitude of this estimate should be trusted, but it should at least accurately tell us when the answer is big vs. small.)

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately. This gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

Unfortunately the reason no one wants to do this calculation is that it's messy. This is $1$ minus the probability that every birthday is shared by at most $k-1$ people. To compute this, let $f_{n, k}$ be the number of ways $n$ people can have birthdays such that every birthday is shared by at most $k-1$ people, so that we want to compute $1 - \frac{f_{n, k}}{365^n}$. General results imply that $f$ has an exponential generating function

$$\boxed{ \sum_n f_{n, k} \frac{t^n}{n!} = \left( \sum_{i=0}^{k-1} \frac{t^i}{i!} \right)^{365} }.$$

For example, to find the probability for $k = 5, n = 1000$ as you ask for we need the coefficient of $t^{1000}$ in $\left( 1 + t + \frac{t^2}{2} + \frac{t^3}{6} + \frac{t^4}{24} \right)^{365}$. As far as I know, this is about as good as it gets for the exact count. You should be able to expand this out in a good computer algebra system but it's a little too big for WolframAlpha.

So instead of asking for the exact count let's ask for an approximation. There is a Poisson approximation available here, generalizing the one discussed on Wikipedia: the number of people having a given birthday $i$ is approximately Poisson with mean $\frac{n}{365}$, and moreover for different birthdays these Poisson random variables are approximately independent. This means that the probability that every birthday is shared by at most $k-1$ people is approximately

$$\frac{f_{n,k}}{365^n} \approx \left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} e^{- \frac{n}{365}} \right)^{365} = \boxed{ e^{-n} \left( \sum_{i=0}^{k-1} \frac{ n^i}{365^i i!} \right)^{365} }.$$

This is actually much better; in particular for $k = 5, n = 1000$ this is now a computation we can easily ask WolframAlpha to do. It gives

$$\frac{f_{1000, 5}}{365^{1000}} \approx e^{-1000} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 3.36 \times 10^{-25} }$$

so the probability that in a group of $1000$ people at least $5$ share the same birthday is very close to $1$.

To get a better sense of how these numbers behave we could ask: for a given $n$, such as $n = 1000$, about what value of $k$ do we need for this probability to no longer be very small like this? In other words, what can we say about the distribution of the largest cluster of people sharing the same birthday? In the Poisson approximation this is equivalent to asking: what is the distribution of the maximum of $365$ iid Poisson random variables with mean $\frac{n}{365}$? It's known that this distribution clusters closely around

$$\frac{\log 365}{W \left( \frac{\log 365}{e \frac{n}{365} } \right)}$$

where $W$ is the Lambert W function, which is complicated-looking but for large $n$ grows roughly like $\frac{en}{365}$ and in any case for $n = 1000$ evaluates to about $\boxed{12}$. This could be tested experimentally by generating some random birthday distributions, as well as by substituting $k = 10, 11, 12, 13, 14$ into the Poisson approximation above.

Edit: An alternative argument which produces a rigorous upper bound starting from the generating function is the following. Write $p(t) = \sum_{i=0}^{k-1} \frac{t^i}{i!}$. Applying the saddle point bound to the generating function gives

$$\frac{f_{n, k}}{n!} \le \frac{p(r)^{365}}{r^n}$$

for any $r > 0$. The optimal value of $r$ is obtained by setting the logarithmic derivative of the RHS to zero; this gives $365 \frac{p'(r)}{p(r)} - \frac{n}{r} = 0$, which is complicated, but if we're willing to approximate $\frac{p'(r)}{p(r)} \approx 1$ then this gives $r \approx \frac{n}{365}$ which is much easier to work with. We get a bound

$$\frac{f_{n, k}}{n!} \le \frac{\left( \sum_{i=0}^{k-1} \frac{ \left( \frac{n}{365} \right)^i}{i!} \right)^{365}}{ \left( \frac{n}{365} \right)^n }$$

and rearranging this, then using the crude Stirling inequality $n! \le en \left( \frac{n}{e} \right)^n$, we get a rigorous upper bound

$$\boxed{ \frac{f_{n, k}}{365^n} \le \frac{n}{e^{n-1}} \left( \sum_{i=0}^{k-1} \frac{n^i}{365^i i!} \right)^{365} }$$

which is very similar to the above bound, and we could improve it a bit by using a more accurate form of Stirling's approximation and a bit more by estimating $r$ more accurately, and a bit more by using a more precise version of the saddle point bound. In any case this bound gives, similar to the above,

$$\frac{f_{1000, 5}}{365^{1000}} \le \frac{1000}{e^{999}} \left( 1 + \frac{1000}{365} + \frac{1000^2}{365^2 2!} + \frac{1000^3}{365^3 3!} + \frac{1000^4}{365^4 4!} \right)^{365} = \boxed{ 9.14 \times 10^{-22} }.$$

added 1589 characters in body
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k
Loading
added 100 characters in body
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k
Loading
Source Link
Qiaochu Yuan
  • 433.3k
  • 53
  • 969
  • 1.4k
Loading