12
$\begingroup$

This is a question that has bugged me for quite some time: what is the chance that x people happen to have their birthdays within n days of each other?

A bit more specific, since this is how a colleague one phrased it: what is the probability that 5 people have their birthdays within 40 days?

Both the birthdays and the "distance" are supposed to be random: there is no fixed time span (e.g., April 1 to May 10) that the birthdays are to lie within. The birthdays should be such, that two birthdays are always within 40 days of each other.

The thing that bugs me, is that it seems to be some kind of recursive calculation, and that I can't find a way to put it into a straightforward mathematical formulation.

To explain that, consider 2 people: the first person is free to have his or her birthday $b_1$ any day of the year, and the second person has 81 days to pick a birthday date $b_2$ from (the 40 day timespan is inclusive, so up to 40 days before $b_1$, plus up to days after $b_1$, plus one on $b_1$ itself. This may be more logically phrased as 41 days for some; I don't know what is best, so please be clear about it in your answer).

Now, for the third person, the number of birthdays he or she can have, is limited by the second person's birthday: if $b_2 = b_1$, then $b_3$ can be among 81 days, but if $b_2 = b_1 + 1$ or $b_2 = b_1 - 1$, there are only 80 days for each option, and 79 for $\|b_1 - b_2\| = 2$, etc.

For the fourth person, the limitation is given by person 2 and 3, complicating things; the fifth person makes things even more complicated.

I've also tried to go the "exclusion" way (what is the chance that 5 people do not share their birthdays within 40 days of each other), but I didn't get anywhere that way.

But perhaps I'm going entirely the wrong way about this.

By now, I've computed it in various way, and I'm quite confident of the answer, but I'm still looking for the mathematical formulation of the general (x birthdays, n days) problem.

The answer I've got, btw, is $7.581428 \cdot 10^{-4}$, or $\frac{13456201}{365^4}$.

NB: this obviously assumes no leap years.
NB2: Extension of the Birthday Problem appears related, though I can't readily see if I can use any of that formulation here.

$\endgroup$
3
  • $\begingroup$ Are we assuming that a person is equally likely to have his or her birthday on any day of the year? $\endgroup$
    – Kamster
    Commented Jul 7, 2014 at 1:50
  • $\begingroup$ Notice that this problem is related to the problem "What is the chance for a single person's birthday to precede $x-1$ people's birthdays by no more than $n$ days." $\endgroup$
    – Ryan
    Commented Jul 7, 2014 at 2:40
  • $\begingroup$ @Evert It is all about frame of reference. In a group of $x$ people whose birthdays all fall within a time window of $n$ days, there is a unique "earliest" birthday and unique "latest" birthday in the group (assuming $n < \frac{365}{2}$). If you use this knowledge to your advantage (like in mjqxxxx's answer), the general algorithm becomes a lot simpler. $\endgroup$
    – Ryan
    Commented Jul 8, 2014 at 1:07

7 Answers 7

5
+50
$\begingroup$

Let $B_1,\dots,B_x$ be the birthdays of persons $1,\dots,x$ and consider for all $m \in [0,364]$ the event "all birthdays happen between days $m$ and $m+n$, and $m$ is one of the birthdays", that is $$ E_m = \{\forall i,\; B_i\in [m, m + n]\}\cap \{\exists i,\; B_i = m\}. $$ (interpret $m$ as some kind of "minimal" birthday)

If $n < 365/2$, the events $E_m$ are mutually exclusive: at most one of them can happen. On the other hand, the probability that all $x$ birthdays are contained in a block of $n+1$ consecutive days is the probability that at least one of these events $E_m$ happens. This probability is therefore $$ \sum_{m=0}^{364} \Pr(E_m) = \sum_{m=0}^{364} \left[\left(\frac{n+1}{365}\right)^x- \left(\frac{n}{365}\right)^x\right] = \frac{(n+1)^x - n^x}{365^{x-1}}. $$

Indeed, $\Pr(E_m)$ is obtained by a simple inclusion-exclusion counting: it is the probability that the birthdays are contained in the block $[m,m+n]$ of size $n+1$ but not all of them in the subblock $[m+1,m+n]$ of size $n$.


What can be said when $n \geq \frac{365}{2}$? The events $E_m$ are not mutually exclusive anymore so we should use the Inclusion–exclusion formula: $$ \sum_m \Pr(E_m) - \sum_{m_1 < m_2} \Pr(E_{m_1}\cap E_{m_2})+ \dots + (-1)^n \sum_{m_1 < \dots < m_n} \Pr(E_{m_1}\cap \dots\cap E_{m_n}). $$

$\endgroup$
3
  • $\begingroup$ @Evert: is it any better now? $\endgroup$
    – Siméon
    Commented Jul 7, 2014 at 16:21
  • $\begingroup$ +1 -- this is the right way to do it. All $x$ birthdays are contained in an $n$-day interval that is the earliest such interval containing all $x$ birthdays if all $x$ birthdays are in the interval ($p=(n/365)^x$) but not all $x$ are in its first $n-1$ days ($p=((n-1)/365)^x$). Multiply the difference of those probabilities by the number of possible intervals ($365$). For the linear calendar (no wrap-around), there are only $366-n$ intervals to consider, and for the one starting on Jan. 1, the exclusion term should be dropped: this reproduces @ByronSchmuland's result. $\endgroup$
    – mjqxxxx
    Commented Jul 7, 2014 at 18:37
  • $\begingroup$ @Evert: drawing a picture might be more convincing, but here is the reasoning. If $E_m$ and $E_{m'}$ both happen and $m < m'$, it implies that $m'$, which is one of the birthdays, lies in $[m,m+n]$. Similarly $m \in [m',m'+n]$. Since $m < m'$, this is only possible if $m' + n - 365 \geq m$, which in turn implies $2n \geq 365$. This is absurd since we assumed $n < 365/2$. Notice that this bound cannot be improved. $\endgroup$
    – Siméon
    Commented Jul 9, 2014 at 11:59
5
$\begingroup$

Consider the smallest interval containing all $x$ birthdays, and suppose its length is $m$ (with $2 \le m \le n$: we will account separately for the $m=1$ case). Suppose that $y$ of the birthdays (with $2\le y\le x$) are at endpoints of the interval, and the remaining $x-y$ birthdays are in the interior. Then there are ${{x}\choose{y}}$ ways to choose which $y$ people have their birthdays on the endpoints, for each of which there are $2^y-2$ ways to choose which endpoints hold those birthdays (omitting the two ways for which all $y$ are at a single endpoint). The probability that these $y$ birthdays are as specified is $\left(\frac{1}{365}\right)^y$, and the probability that the remaining $x-y$ birthdays are inside this interval is $\left(\frac{m-2}{365}\right)^{x-y}$. There are $365$ possible starting points for the interval. Summing over the possible values for $m$ and $y$, and adding the $m=1$ term (for which the probability is $365^{-(x-1)}$), gives the final result: $$ P(n,x)=\frac{1}{365^{x-1}}+365\sum_{m=2}^{n}\sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{1}{365}\right)^y{{x}\choose{y}}\left(2^y-2\right) \\ =\frac{1}{365^{x-1}} + 365\sum_{m=2}^{n}\left[\sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{2}{365}\right)^y{{x}\choose{y}} - 2 \sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{1}{365}\right)^y{{x}\choose{y}}\right]. $$ The sums over $y$ can be simplified, since $$ \sum_{y=2}^{x}a^{x-y}b^{y}{{x}\choose{y}}=\sum_{y=0}^{x}a^{x-y}b^{y}{{x}\choose{y}}-a^{x-1}bx-a^x=(a+b)^x-a^{x-1}bx-a^x. $$ We find $$ P(n,x)=\frac{1}{365^{x-1}}\left(1+\sum_{m=2}^{n}\left[m^{x}-2\left(m-1\right)^{x}+\left(m-2\right)^{x}\right]\right) $$ after some algebra. The remaining sum telescopes, yielding the final, very elegant, result: $$ P(n,x)=\frac{n^x - (n-1)^x}{365^{x-1}}. $$ (This agrees with OP's answer after setting $n=41$ and $x=5$. The question is slightly ambiguous: the title says "within $n$ days of each other", but the first sentence says "within a timespan of $n$ days", which I interpreted to mean "within a block of $n$ consecutive days".)

$\endgroup$
14
  • $\begingroup$ If $x$ is the number of persons and $n$ is the period, then we should have $P(365,1) = 1$... But we have $P(365,1) = 0$... $\endgroup$ Commented Jul 7, 2014 at 1:27
  • 2
    $\begingroup$ A term $1/365^{x-1}$ needs to be added to account for the possibility that all $x$ birthdays are on the same day. Also the derivation only works for $2n<365$, since otherwise "smallest interval containing all birthdays" is not well-defined. $\endgroup$
    – mjqxxxx
    Commented Jul 7, 2014 at 1:37
  • $\begingroup$ I've generated $10^7$ examples with $x=5$; the number where all birthdays fit into a $40$-day interval was $6774$. This indicates the correct result is $P(40,5)=(6.77 \pm 0.08)\times 10^{-4}$. My formula gives $6.86\times 10^{-4}$, which is consistent with the simulation; OP's result does not appear to be. $\endgroup$
    – mjqxxxx
    Commented Jul 7, 2014 at 2:03
  • $\begingroup$ For $x=5$ the number of permutations is $6.47835\times 10^{+12}$, the number of permutation that fits in the 40-day interval is $9.34 \times 10^{+8}$. $\endgroup$ Commented Jul 7, 2014 at 3:05
  • $\begingroup$ @mjqxxxx It does agree with my result, except that you forgot that the interval is inclusive: replace $n$ by $n+1$ and you'll find the same answer (i.e., $n = 0$ for birthdays on the same day). $\endgroup$
    – user88701
    Commented Jul 7, 2014 at 8:58
3
$\begingroup$

Here's is another elementary approach based on combinatorial enumeration.

Restriction: This is only a partial answer valid for $1\leq n\leq 183$. A complete answer is planned to follow.

We assume the year having $365$ days. We assume a situation with $x \geq 2$ people and ask for a birthday of them within an interval of length $1\leq n\leq 183$ days.

A starter: $n=2,x=3$

To motivate the approach I would like to give a small example, namely asking for the probability $P(n=2,x=3)$ of three people having their birthday within two consecutive days.

First we determine the number $T$ of all possible configurations. \begin{align*} T=365^x=365^3\tag{1} \end{align*}

Note: Please note, that if we agree that $(1)$ is correct, we observe that e.g. the tripel $(1,365,1)$ means, that the first person's birthday and the third person's birthday is the first day in the year. The second persons birthday is the last day in the year and the configurations $(1,365,1)$ and $(1,1,365)$ are different.

Now, we derive the number $X$ of proper configurations by simply enumerating them.

There are $X_1=365$ proper configurations of three people with birthday on exactly one day \begin{array}{lllr} (1,1,1)&\quad\dots\quad&(364,364,364),&(365,365,365)\tag{2}\\ \end{array} and there are $X_2=6\cdot365$ proper configurations of three people having their birthday on exactly two consecutive days \begin{array}{lllr} (1,1,2)&\quad\dots\quad&(364,364,365),&(365,365,1)\tag{3}\\ (1,2,1)&\quad\dots\quad&(364,365,364),&(365,1,365)\\ (2,1,1)&\quad\dots\quad&(365,364,364),&(1,365,365)\\ (1,2,2)&\quad\dots\quad&(364,365,365),&(365,1,1)\\ (2,1,2)&\quad\dots\quad&(365,364,365),&(1,365,1)\\ (2,2,1)&\quad\dots\quad&(365,365,364),&(1,1,365)\\ \end{array}

This gives a total of $X=X_1+X_2=7\cdot365$ proper configurations.

We conclude:

\begin{align*} P(2,3)=\frac{X}{T}=\frac{7}{365^2}\simeq5.3\cdot10^{-5} \end{align*}

Now we generalise this approach.

Generalisation: $P(n,x)$ for $x\geq 2$ and $1\leq n\leq 183$

First step: $n=1$

We start the easy way with $n=1$. If there are $x$ people and we ask for having their birthay within exactly one day, there are $365$ proper configurations corresponding to $(1)$, namely the $x$-tuples \begin{array}{lllr} \underbrace{(1,1,\dots,1)}_{x}&\quad\dots\quad&\underbrace{(364,364,\dots,364)}_{x},&\underbrace{(365,365,\dots,365)}_{x}\\ \end{array}

So we get for

$n=1$:

\begin{align*} X_1=365 \end{align*}

Second step: $2\leq n\leq 183$

Let's assume an interval of length $k$ with $2\leq k \leq n$ and let's ask for the number of proper configurations of people having their birthday within an interval of length exactly $k$.

So, we have to determine the number $Y_k$ of $x$-tuples which have values only within the interval and which also contain at least one left endpoint and at least one right endpoint of the interval. This number is \begin{align*} Y_k=k^x-2(k-1)^x+(k-2)^x\qquad(x \geq 2) \end{align*}

Observe that $k^x$ gives the number of all $x$-tuples with (the $k$ different) values from the interval. We subtract $2(k-1)^x$ which is the number of $x$-tuples having no left endpoint or having no right endpoint of the interval. Since we subtracted the number of $x$-tuples, which have neither left nor right endpoint twice, we have to add $(k-2)^x$ for compensation according to the Inclusion-Exclusion Principle.

There are $365$ different intervals of length $k$, namely:

\begin{align*} &[1,2,\dots,k]\\ &[2,3,\dots,k+1]\\ &\quad\dots\\ &[364,365]\cup[1,2,\dots,k-3,k-2]\\ &[365]\cup[1,2,\dots,k-2,k-1]\\ \end{align*}

And since we can vary $k$ between $2$ and $n$, we observe that $X_2$, the number of proper configurations of $x\geq 2$ people having their birthdays within $n$ days is after some simplifications for

$2 \leq n\leq 183$: \begin{align*} X_2&=365\sum_{k=2}^{n}Y_k\\ &=365\sum_{k=2}^{n}\left(k^x-2(k-1)^x+(k-2)^x\right)\\ &=365\left(n^x-(n-1)^{x}-1\right) \end{align*}

This gives a total of $X=X_1+X_2=365\left(n^x-(n-1)^x\right)$

We conclude:

\begin{align*} P(n,x)=\frac{X}{T}=\frac{n^x-(n-1)^x}{365^{x-1}}\qquad(1\leq n \leq 183), \quad (x\geq2) \end{align*}

in accordance with the answer of mjqxxxx if we additionally respect $n\leq 183$.

Note: Observe, that if we ask for the probability of two people having their birthday within $183$ days, the result should give $1$. Indeed we get \begin{align*} P(n,x)=P(183,2)=\frac{183^2-182^2}{365}=\frac{365}{365}=1 \end{align*}

Note: Observe, that due to some circular overlappings of intervals in case of $184 \leq n \leq 365$ some proper configurations are repeatedly counted, so that for some values of $x$ \begin{align*} P(n,x) \geq 1\qquad (184 \leq n \leq 365) \end{align*}

To correct the formula of $P(n,x)$ in this case should be the next activitiy :-)

$\endgroup$
2
  • $\begingroup$ @Evert: Yes, if the interval is e.g $[1,2,3,4]$ then $k^x=4^3$ counts all triples with values from \{1,2,3,4\} especially the non proper configurations, which do not include both endpoints, e,g, your example $(2,2,3)$. They have to be subtracted by $2(k-1)^x-(k-2)^x$ $\endgroup$ Commented Jul 9, 2014 at 9:02
  • $\begingroup$ @Evert: Small note to your second comment. $n>183$ is nuce, since it seems to be a bit more challenging than the other part. :-) Anyways, we should always clearly state the range of validity of our formulas to help preventing misinterpretations or wrong usage. Regards, $\endgroup$ Commented Jul 9, 2014 at 12:41
2
$\begingroup$

Using ideas similar to mjqxxxx, and following his notation I got $$P(n,x)=(N-n+1)\left({n\over N}\right)^x-(N-n)\left({n-1\over N}\right)^x,\quad 1\leq n\leq N.$$ Here $N$ is the length of the calendar; the OP takes $N=365$. With $n=40$ and $x=5$, this gives $${162381413\over 259133949125}\approx .0006266.$$


This solution is for a linear calendar where Dec. 31 and Jan. 1 are not considered consecutive. I am still working on the circular calendar.

$\endgroup$
4
  • $\begingroup$ Hmmmm. If we take $N=4$ and $n=2$ and $x=2$ we get $$\{ 11,12,13,14,21,22,23,24,31,32,33,34,41,42,43,44 \}$$ permutations... $\endgroup$ Commented Jul 7, 2014 at 3:19
  • $\begingroup$ ... but only $$\{ 11,12,14,21,22,23,32,33,34,42,43,44 \}$$ satisfy the period condition, that gives $75\%$. $\endgroup$ Commented Jul 7, 2014 at 3:26
  • $\begingroup$ @johannesvalks 14 and 24 don't satisfy the period condition because I am working with a linear calendar, not a circular calendar. $\endgroup$
    – user940
    Commented Jul 7, 2014 at 4:18
  • $\begingroup$ I see. Thanx... $\endgroup$ Commented Jul 7, 2014 at 10:34
1
$\begingroup$

I consider a brute method in the first place by considering real numbers instead of integers.


Let us consider two persons and let us plot the permutations in a square (see illustration). These squares are periodic because the year is cyclic. We are interested in the brown area and that is given by

$$ P(z) = 1 - \Big( 1 - z \Big)^2 + z^2,\ z \le \frac{1}{2}. $$

And clearly we have $P(0)=0$ and $P(\tfrac{1}{2}) = 1$, as expected. Note that this is a brute method because we do not consider integers.

Birthday permutations of two persons

We can find the brown area more general by

$$ \int_0^1 dx \int_{x-z}^{x+z} dy = \int_0^1 dx \Big[y\Big]_{x-z}^{x+z} = \int_0^1 dx 2z = 2z, $$

and

$$ 2z = 1 - \Big( 1 - z \Big)^2 + z^2. $$


We expand this idea to $p$ persons, so we get a $p$-cube. We obtain

$$ P(z) = \int_0^1 dx_1 \int_{x_1-z}^{x_1+z} dx_2 \int_{x_1-z}^{x_1+z} dx_3 \cdots \int_{x_1-z}^{x_1+z} dx_p = \Big(2z\Big)^{p-1} $$

As $2z$ is the period $n$ divided by the number of years, we have $\displaystyle 2z=\frac{n}{N}$, whence

$$ P = \Big(\frac{n}{N}\Big)^{p-1}. $$

That is why I came with

$$ \Big( \frac{40}{365} \Big)^k $$

in the first place.


When we consider integers, we need to replace the integrals by a summations.

The total number of permutations for birthdays in the year for $p$ persons is given by

$$ P_\textrm{total} = \sum_{k_1=1}^N \sum_{k_2=1}^N \sum_{k_3=2}^N \cdots \sum_{k_p=1}^N = N^p. $$

The number of permutations for birthdays that lie in a period of $n$ days for $p$ persons is given by

$$ P_\textrm{period} = \sum_{k_1=1}^N \sum_{k_2=k_1-n/2}^{k_1+n/2} \sum_{k_3=k_1-n/2}^{k_1+n/2} \cdots \sum_{k_2=k_1-n/2}^{k_1+n/2} = N \Big( n + 1 \Big)^{p-1}. $$

The change is given by

$$ \frac{P_\textrm{period}}{P_\textrm{total}} = \left( \frac{n+1}{N} \right)^{p-1},\ n < N. $$


Small tests:

Consider $N=4$, $n=2$ and $p=2$. We get

$$ \left( \frac{2+1}{4} \right)^{2-1} = \frac{3}{4}. $$

We have the permutations

$$ \begin{array}{c|cccc} &1&2&3&4\\ \hline 1&\times&\times&&\times\\ 2&\times&\times&\times&\\ 3&&\times&\times&\times\\ 4&\times&&\times&\times\\ \end{array} $$

where the $\times$ denotes the permutations that satisfy the period condition. We have totaly $16$ permutation and $12$ permutations that satisfy the condition, thus

$$ P = \frac{12}{16} = \frac{3}{4}. $$


The case $n=40$, $N=365$ and $p=5$ gives

$$ \left( \frac{40+1}{365} \right)^{5-1} = \left( \frac{41}{365} \right)^{4} \rightarrow 0.02\% $$

$\endgroup$
7
  • $\begingroup$ I used permutations, and I place persons on the day of the year, so we automaticaly get the 40 days back as both $p_1 \cdots p_2$ and $p_2 \cdots p_1$ are permutations that satisfy the condition. $\endgroup$ Commented Jul 7, 2014 at 11:02
  • $\begingroup$ I note that my equation may not hold - for I did not include that the given period of $40$ days is always $40$ days - it might be less as well... I need to go back to the drawingboard! $\endgroup$ Commented Jul 7, 2014 at 11:06
  • $\begingroup$ I have updated my answer... $\endgroup$ Commented Jul 7, 2014 at 13:14
  • $\begingroup$ @Evert - damn - I wrote out that diagram as well in excel and it did work out for my formula - but I did not save it. I will make it again... $\endgroup$ Commented Jul 7, 2014 at 14:03
  • $\begingroup$ Ahhhhhh - the comment does not allow long posts... I write a new answer for this reply... $\endgroup$ Commented Jul 7, 2014 at 14:13
1
$\begingroup$

Additional post for a comment...

Given the case $p=3$ and $N=4$ we get the following permutations:

$$\begin{array}{ccccccc} 0&1&2&3&n=0&n=1&n=2\\ p_1p_2p_3&&&&\times&&\\ p_1p_2&p_3&&&&\times&\\ p_1p_2&&p_3&&&&\times\\ p_1p_2&&&p_3&&\times&\\ p_1p_3&p_2&&&&\times&\\ p_1&p_2p_3&&&&\times&\\ p_1&p_2&p_3&&&&\times\\ p_1&p_2&&p_3&&&\times\\ p_1p_3&&p_2&&&&\times\\ p_1&p_3&p_2&&&&\times\\ p_1&&p_2p_3&&&&\times\\ p_1&&p_2&p_3&&&\times\\ p_1p_3&&&p_2&&\times&\\ p_1&p_3&&p_2&&&\times\\ p_1&&p_3&p_2&&&\times\\ p_1&&&p_2p_3&&\times&\\ p_2p_3&p_1&&&&\times&\\ p_2&p_1p_3&&&&\times&\\ p_2&p_1&p_3&&&&\times\\ p_2&p_1&&p_3&&&\times\\ p_3&p_1p_2&&&&\times&\\ &p_1p_2p_3&&&\times&&\\ &p_1p_2&p_3&&&\times&\\ &p_1p_2&&p_3&&&\times\\ p_3&p_1&p_2&&&&\times\\ &p_1p_3&p_2&&&\times&\\ &p_1&p_2p_3&&&\times&\\ &p_1&p_2&p_3&&&\times\\ p_3&p_1&&p_2&&&\times\\ &p_1p_3&&p_2&&&\times\\ &p_1&p_3&p_2&&&\times\\ &p_1&&p_2p_3&&&\times\\ p_2p_3&&p_1&&&&\times\\ p_2&p_3&p_1&&&&\times\\ p_2&&p_1p_3&&&&\times\\ p_2&&p_1&p_3&&&\times\\ p_3&p_2&p_1&&&&\times\\ &p_2p_3&p_1&&&\times&\\ &p_2&p_1p_3&&&\times&\\ &p_2&p_1&p_3&&&\times\\ p_3&&p_1p_2&&&&\times\\ &p_3&p_1p_2&&&\times&\\ &&p_1p_2p_3&&\times&&\\ &&p_1p_2&p_3&&\times&\\ p_3&&p_1&p_2&&&\times\\ &p_3&p_1&p_2&&&\times\\ &&p_1p_3&p_2&&\times&\\ &&p_1&p_2p_3&&\times&\\ p_2p_3&&&p_1&&\times&\\ p_2&p_3&&p_1&&&\times\\ p_2&&p_3&p_1&&&\times\\ p_2&&&p_1p_3&&\times&\\ p_3&p_2&&p_1&&&\times\\ &p_2p_3&&p_1&&&\times\\ &p_2&p_3&p_1&&&\times\\ &p_2&&p_1p_3&&&\times\\ p_3&&p_2&p_1&&&\times\\ &p_3&p_2&p_1&&&\times\\ &&p_2p_3&p_1&&\times&\\ &&p_2&p_1p_3&&\times&\\ p_3&&&p_1p_2&&\times&\\ &p_3&&p_1p_2&&&\times\\ &&p_3&p_1p_2&&\times&\\ &&&p_1p_2p_3&\times&&\\ \hline &&&&4&24&36\\ \end{array}$$

$\endgroup$
1
$\begingroup$

Fixing two endpoints argument by mjqxxxx gave me an idea, which makes possible to use exponential generating functions.

For the problem, consider a multiset (any day can be selected multiple times) $$[d_1,d_2,d_3,\ldots, d_{41}]$$

We have to make a string of 5 characters, which must have at least one of $d_1$ and $d_{41}$, and any of the remaining characters in the multiset.

So, the number of possible strings are given by the egf \begin{align*} G_{41}(x) &= \left(e^x-1\right)^2\cdot e^{39x} \end{align*}

That was for the range of 41 days.

Now, reduce the available multiset by one element: $$[d_1,d_2,d_3,\ldots, d_{40}]$$

The egf for this is: \begin{align*} G_{40}(x) &= \left(e^x-1\right)^2\cdot e^{38x} \end{align*} and we go on till: \begin{align*} G_{2}(x) &= \left(e^x-1\right)^2 \end{align*}

and $$G_1(x)=e^x$$

Hence, the egf for the range to be in between 1 and 41 days, inclusive, is:

\begin{align*} E(x) &= e^x + \sum_{i=2}^{41} G_i(x) \\ &= e^x + \sum_{i=2}^{41} \left(e^x-1\right)^2\cdot e^{(i-2)x} \\ &= e^x + \left(e^x-1\right)^2\frac{\left(e^{40x}-1\right)}{e^x-1} \\ &= e^x + \left(e^x-1\right)\cdot \left(e^{40x}-1\right) \\ &= e^{41x}-e^{40x}+1 \end{align*}

Since we require the number of permutations for the 5 character string, we extract $\left[\frac{x^5}{5!}\right]$ from that, which is $41^5-40^5$

and since the range of days can start in any of the 365 days, times 365 the above.

And the final probability is: \begin{align*} \mathbb{P} &= 365\times \frac{41^5-40^5}{365^5} \end{align*}

The above result can then be generalized with little effort: \begin{align*} P(n,x) &= \frac{n^x-(n-1)^x}{365^{x-1}} \end{align*} $$\Big(\mathbb{P}=P(41,5)\Big)$$

$\endgroup$
2
  • $\begingroup$ Gar, I'm not familiar with exponential generating functions. I can search for it, but perhaps you can provide a link with good information in your answer? Because I don't understand how you arrive at your first equation (for $G_{41}$). $\endgroup$
    – user88701
    Commented Jul 7, 2014 at 15:00
  • $\begingroup$ Okay, I think this file may provide a quick introduction, which has related examples in slides 18 and 19. I think you can make out how I chose my egf once you read that. $\endgroup$
    – gar
    Commented Jul 7, 2014 at 15:14

You must log in to answer this question.