Skip to main content
added 152 characters in body
Source Link
André Nicolas
  • 508.6k
  • 47
  • 566
  • 1k

I will try to get control of the most standard interpretation of our question by using (at first) very informal language. Let us call someone unhappy if one or more people share his/her "birthday." We want to find the "expected number" of unhappy people.

Define the random variable $X$ by saying that $X$ is the number of unhappy people. We want to find $\text{E}(X)$. Let $p_i$ be the probability that $X=i$. Then $$\text{E}(X)=\sum_{i=0}^{n} i\,p_i$$ That is roughly the approach that you took. That approach is correct, and a very reasonable thing to try. Unfortunately Indeed have been trained to use this approach, since that's exactly how you solved the exercises that followed the definition of expectation.

Unfortunately, in this problem, finding the $p_i$ is very difficult. One could, as you did, decide that for a good approximation, only the first few $p_i$ really matter. That is sometimes true, but depends quite a bit on the values $N$ of "days in the year" and the number $n$ of people.

Fortunately, in this problem, and many others like it, there is an alternative very effective approach. It involves a bit of theory, but the payoff is considerable.

Line the people up in a row. Define the random variables $U_1,U_2,U_3,\dots,U_n$ by saying that $U_k=1$ if the $k$-th person is unhappy, and $U_k=0$ if the $k$-th person is not unhappy. The crucial observation is that $$X=U_1+U_2+U_3+\cdots + U_n$$

One way to interpret this is that you, the observer, go down the line of people, making a tick mark on your tally sheet if the person is unhappy, and making no mark if the person is not unhappy. The number of tick marks is $X$, the number of unhappy people. It is also the sum of the $U_k$.

We next use the following very important theorem: The expectation of a sum is the sum of the expectations. This theorem holds "always." The random variables you are summing need not be independent. In our situation, the $U_k$ are not independent, but, for expectation of a sum, that does not matter. So we have $$\text{E}(X)=\text{E}(U_1) + \text{E}(U_2)+ \text{E}(U_3)+\cdots +\text{E}(U_n)$$

Finally, note that the probability that $U_k=1$ is, as carefully explained by @Henry, equal to $p$, where $$p=1-(1-1/N)^{n-1}$$ It follows that $\text{E}(U_k)=p$ for any $k$, and therefore $\text{E}(X)=np$.

I will try to get control of the most standard interpretation of our question by using (at first) very informal language. Let us call someone unhappy if one or more people share his/her "birthday." We want to find the "expected number" of unhappy people.

Define the random variable $X$ by saying that $X$ is the number of unhappy people. We want to find $\text{E}(X)$. Let $p_i$ be the probability that $X=i$. Then $$\text{E}(X)=\sum_{i=0}^{n} i\,p_i$$ That is roughly the approach that you took. That approach is correct, and a very reasonable thing to try. Unfortunately, in this problem, finding the $p_i$ is very difficult. One could, as you did, decide that for a good approximation, only the first few $p_i$ really matter. That is sometimes true, but depends quite a bit on the values $N$ of "days in the year" and the number $n$ of people.

Fortunately, in this problem, and many others like it, there is an alternative very effective approach. It involves a bit of theory, but the payoff is considerable.

Line the people up in a row. Define the random variables $U_1,U_2,U_3,\dots,U_n$ by saying that $U_k=1$ if the $k$-th person is unhappy, and $U_k=0$ if the $k$-th person is not unhappy. The crucial observation is that $$X=U_1+U_2+U_3+\cdots + U_n$$

One way to interpret this is that you, the observer, go down the line of people, making a tick mark on your tally sheet if the person is unhappy, and making no mark if the person is not unhappy. The number of tick marks is $X$, the number of unhappy people. It is also the sum of the $U_k$.

We next use the following very important theorem: The expectation of a sum is the sum of the expectations. This theorem holds "always." The random variables you are summing need not be independent. In our situation, the $U_k$ are not independent, but, for expectation of a sum, that does not matter. So we have $$\text{E}(X)=\text{E}(U_1) + \text{E}(U_2)+ \text{E}(U_3)+\cdots +\text{E}(U_n)$$

Finally, note that the probability that $U_k=1$ is, as carefully explained by @Henry, equal to $p$, where $$p=1-(1-1/N)^{n-1}$$ It follows that $\text{E}(U_k)=p$ for any $k$, and therefore $\text{E}(X)=np$.

I will try to get control of the most standard interpretation of our question by using (at first) very informal language. Let us call someone unhappy if one or more people share his/her "birthday." We want to find the "expected number" of unhappy people.

Define the random variable $X$ by saying that $X$ is the number of unhappy people. We want to find $\text{E}(X)$. Let $p_i$ be the probability that $X=i$. Then $$\text{E}(X)=\sum_{i=0}^{n} i\,p_i$$ That is roughly the approach that you took. That approach is correct, and a very reasonable thing to try. Indeed have been trained to use this approach, since that's exactly how you solved the exercises that followed the definition of expectation.

Unfortunately, in this problem, finding the $p_i$ is very difficult. One could, as you did, decide that for a good approximation, only the first few $p_i$ really matter. That is sometimes true, but depends quite a bit on the values $N$ of "days in the year" and the number $n$ of people.

Fortunately, in this problem, and many others like it, there is an alternative very effective approach. It involves a bit of theory, but the payoff is considerable.

Line the people up in a row. Define the random variables $U_1,U_2,U_3,\dots,U_n$ by saying that $U_k=1$ if the $k$-th person is unhappy, and $U_k=0$ if the $k$-th person is not unhappy. The crucial observation is that $$X=U_1+U_2+U_3+\cdots + U_n$$

One way to interpret this is that you, the observer, go down the line of people, making a tick mark on your tally sheet if the person is unhappy, and making no mark if the person is not unhappy. The number of tick marks is $X$, the number of unhappy people. It is also the sum of the $U_k$.

We next use the following very important theorem: The expectation of a sum is the sum of the expectations. This theorem holds "always." The random variables you are summing need not be independent. In our situation, the $U_k$ are not independent, but, for expectation of a sum, that does not matter. So we have $$\text{E}(X)=\text{E}(U_1) + \text{E}(U_2)+ \text{E}(U_3)+\cdots +\text{E}(U_n)$$

Finally, note that the probability that $U_k=1$ is, as carefully explained by @Henry, equal to $p$, where $$p=1-(1-1/N)^{n-1}$$ It follows that $\text{E}(U_k)=p$ for any $k$, and therefore $\text{E}(X)=np$.

Source Link
André Nicolas
  • 508.6k
  • 47
  • 566
  • 1k

I will try to get control of the most standard interpretation of our question by using (at first) very informal language. Let us call someone unhappy if one or more people share his/her "birthday." We want to find the "expected number" of unhappy people.

Define the random variable $X$ by saying that $X$ is the number of unhappy people. We want to find $\text{E}(X)$. Let $p_i$ be the probability that $X=i$. Then $$\text{E}(X)=\sum_{i=0}^{n} i\,p_i$$ That is roughly the approach that you took. That approach is correct, and a very reasonable thing to try. Unfortunately, in this problem, finding the $p_i$ is very difficult. One could, as you did, decide that for a good approximation, only the first few $p_i$ really matter. That is sometimes true, but depends quite a bit on the values $N$ of "days in the year" and the number $n$ of people.

Fortunately, in this problem, and many others like it, there is an alternative very effective approach. It involves a bit of theory, but the payoff is considerable.

Line the people up in a row. Define the random variables $U_1,U_2,U_3,\dots,U_n$ by saying that $U_k=1$ if the $k$-th person is unhappy, and $U_k=0$ if the $k$-th person is not unhappy. The crucial observation is that $$X=U_1+U_2+U_3+\cdots + U_n$$

One way to interpret this is that you, the observer, go down the line of people, making a tick mark on your tally sheet if the person is unhappy, and making no mark if the person is not unhappy. The number of tick marks is $X$, the number of unhappy people. It is also the sum of the $U_k$.

We next use the following very important theorem: The expectation of a sum is the sum of the expectations. This theorem holds "always." The random variables you are summing need not be independent. In our situation, the $U_k$ are not independent, but, for expectation of a sum, that does not matter. So we have $$\text{E}(X)=\text{E}(U_1) + \text{E}(U_2)+ \text{E}(U_3)+\cdots +\text{E}(U_n)$$

Finally, note that the probability that $U_k=1$ is, as carefully explained by @Henry, equal to $p$, where $$p=1-(1-1/N)^{n-1}$$ It follows that $\text{E}(U_k)=p$ for any $k$, and therefore $\text{E}(X)=np$.