16
$\begingroup$

When you download a file from a website, the file gets corrupted with probability 0.8. What is the expected number of downloads to get an uncorrupted file?

I have no idea how to do this. I only know the probability that a file isn't corrupted should be 0.2, but how do I get the expectation? Can anyone help me?

$\endgroup$
1

7 Answers 7

38
$\begingroup$

Your question essentially boils down to finding the expected value of a geometric random variable. That is, if $X$ is the number of trials needed to download one non-corrupt file then $$X\sim Geo(0.2)$$ In general, if $X\sim Geo(p)$ then $$E(X)=\frac1 p$$ So in your case the expected number of trials to download an uncorrupted file is $$E(X)=\frac{1}{0.2}=5$$ Addendum: Here is a derivation of the above mentioned result.

First note that $P(X=k)=p(1-p)^{k-1}$. The expected value is thus $$\begin{align*}E(X)&=\sum_{k=1}^{\infty}kp(1-p)^{k-1} \\ &=p\sum_{k=1}^{\infty}k(1-p)^{k-1} \\ &=p\left(-\frac{d}{dp}\sum_{k=1}^{\infty}(1-p)^k\right) \\ &=p\left(-\frac{d}{dp}\frac{1-p}{p}\right) \\ &=p\left(\frac{d}{dp}\left(1-\frac{1}{p}\right)\right)=p\left(\frac{1}{p^2}\right)=\frac1p\end{align*}$$

Derivative step: (answer to comment)

Simple use of chain rule gives: $$ -\frac{d}{dp}\sum_{k=1}^{\infty}(1-p)^k = \sum_{k=1}^{\infty}k(1-p)^{k-1} $$ It is clear that $$ -\frac{d}{dp}\sum_{k=1}^{\infty}(1-p)^k = -\frac{d}{dp}\left( \sum_{k=1}^{\infty}(1-p)^k \right)$$ given that $ 0 <1 - p < 1$ we can use the geometric series formula to obtain: $$ \left( \sum_{k=1}^{\infty}(1-p)^k \right) = \frac{1 - p}{p} $$ then proof follows accordingly.

$\endgroup$
3
  • 5
    $\begingroup$ I have been googling for hours to find a proof for the expectation, that's always what I see. The problem is that I don't understand the derivation part. Can you explain it more or share a resource where I can understand it? $\endgroup$ Commented Nov 2, 2015 at 20:02
  • $\begingroup$ You may be looking for the convergence of the geometric series which is given because 1-p < 1: Convergence of geometric series $\endgroup$
    – Scriddie
    Commented Nov 3, 2021 at 21:10
  • 1
    $\begingroup$ Brilliant proof. Especially the derivative step. $\endgroup$
    – Dude156
    Commented Oct 10, 2022 at 19:09
13
$\begingroup$

An intuitive and telling approach to this is to find a functional identity (see note at the end) that the random number $X$ of downloads necessary to get an uncorrupted file satisfies. The everyday situation you describe amounts to the following:

  • With probability $p=0.2$, $X=1$ (first file uncorrupted).
  • With probability $1-p=0.8$, $X=1+Y$, where $Y$ is distributed like $X$ (first file corrupted, then continue with the next files).

Thus, $E[Y]=E[X]$ hence $$E[X]=p\cdot1+(1-p)\cdot(1+E[X]), $$ from which the arch-classical formula $E[X]=1/p$ follows.

Note that this also yields the full distribution of $X$, for example, for every $|s|\leqslant1$, $g(s)=E[s^X]$ is such that $E[s^{Y}]=g(s)$ hence $g(s)$ must solve the corresponding identity $$g(s)=p\cdot s+(1-p)\cdot s\cdot g(s), $$ hence $$ \sum_{n\geqslant0}P[X=n]s^n=g(s)=\frac{ps}{1-(1-p)s}=ps\sum_{n\geqslant0}(1-p)^ns^n, $$ from which $P[X=n]=p(1-p)^{n-1}$ follows, for every $n\geqslant1$.


Note: Since some user was kind enough to upvote this a long time after it was written, I just reread the whole page. Frankly, I found appalling the insistence of a character to confuse binomial distributions with geometric distributions, but I also realized that the functional identity referred to in the first sentence of the present answer had not been made explicit, so here it is.

The distribution of the number $X$ of downloads to get an uncorrupted file is the only solution of the identity in distribution $$X\stackrel{(d)}{=}1+BX,$$ where the random variable $B$ on the RHS is independent of $X$ on the RHS and Bernoulli distributed with $$P(B=0)=p,\qquad P(B=1)=1-p.$$

This merely summarizes the description in words at the beginning of this post, and allows to deduce all the mathematical results above. This also yields a representation of $X$ as

$$X\stackrel{(d)}{=}1+\sum_{n=1}^\infty\prod_{k=1}^nB_k,\qquad\text{with $(B_k)$ i.i.d. and distributed as }B.$$

Finally, note that every positive integer valued random variable $X$ can be represented as the sum of such a series for some independent sequence of Bernoulli random variables $(B_k)$, but that the distribution of $B_k$ being independent on $k$ characterizes the fact that the distribution of the sum $X$ is geometric.

$\endgroup$
4
  • $\begingroup$ Thank you for this answer! Really sheds light on a way of thinking about this you don't get in some textbooks. I was wondering, though, do you know where I can find a proof of the result that "every positive integer valued random variable $X$ can be represented as the sum of such a series for some independent sequence of Bernoulli random variables ($B_k$)"? $\endgroup$
    – Rasputin
    Commented Jan 17, 2019 at 17:34
  • $\begingroup$ @Rasputin Where? I do not know. The reason it holds is simply that, if $X$ is the sum of the random series on the RHS, then, for every $n$, $$\{X\geqslant n+1\}=\{B_1=\cdots=B_n=1\}$$ hence $$P(X\geqslant n+1\mid X\geqslant n)=P(B_n=1)$$ which is a free parameter of the model. Thus, every family of such conditional probabilities corresponds to a family of parameters of the Bernoulli random variables $(B_n)$, and we are done. $\endgroup$
    – Did
    Commented Jan 17, 2019 at 18:28
  • $\begingroup$ Alright, that makes sense. Thanks! $\endgroup$
    – Rasputin
    Commented Jan 17, 2019 at 21:14
  • $\begingroup$ +1 I miss you so much.... $\endgroup$
    – drhab
    Commented Jun 3, 2020 at 5:34
3
$\begingroup$

A clever solution to find the expected value of a geometric r.v. is those employed in this video lecture of the MITx course "Introduction to Probability: Part 1 - The Fundamentals" (by the way, an extremely enjoyable course) and based on (a) the memoryless property of the geometric r.v. and (b) the total expectation theorem.

If you compute E[X] as the sum of the two leafs of the probability tree regarding the first outcome, you end up with:

E[X] = 1 + pE[X-1|x=1] + (1-p)E[X-1|x>1]
E[X] = 1 + 0  + (1-p)E[X]

From which, solving for E[X], you can find E[X] = 1/p

$\endgroup$
1
$\begingroup$

Let $X \sim Geom(p)$. Then

$\begin{align} \mathbb{E}[X] & = \sum_{n=1}^\infty n(1-p)^{n-1}p\\ & = p\Sigma_1 \end{align}$

Where $\Sigma_1 = \sum_{n=1}^\infty n(1-p)^{n-1}$. Then let $\Sigma_0 = 1 + (1-p) + (1-p)^2 + \ldots = \frac{1}{1 - 1 + p} = \frac{1}{p}$ as we have a geometric series. Then we have

$\begin{align} \Sigma_1 & = 1 + 2(1-p) + 3(1-p)^2 + \ldots\\ (1-p)\Sigma_1 & = (1-p) + 2(1-p)^2 + \ldots\\ (1 - 1 + p)\Sigma_1 & = p\Sigma_1 = 1 + (1-p) + (1-p)^2 + \ldots = \Sigma_0 = \frac{1}{p}\\ \Sigma_1 & = \frac{1}{p^2} \end{align}$

And from above we know that $\mathbb{E}[X] = p\Sigma_1$. So finally:

$\begin{equation*} \mathbb{E}[X] = p\Sigma_1 = \frac{1}{p} \end{equation*}$

$\endgroup$
1
$\begingroup$

Here is a different approach (using the Tail-Sum formula) to find the expected value of a random variable $X\sim Geom(p)$.

$$\mathbb{E}[X] = \sum_{k=1}^{\infty}\mathbb{P}(X \ge k) =\sum_{k=1}^{\infty} (1-p)^{k-1} = \sum_{k=0}^{\infty} (1-p)^k = \frac{1}{1-(1-p)} = \frac1p$$

$\endgroup$
1
  • $\begingroup$ This is a really nice proof. The best one considered no other answer really justifies the interchange of the summation and the derivative. $\endgroup$ Commented Mar 18, 2023 at 18:47
-1
$\begingroup$

Expanding/complementing on what aflous said: your random variable is "number of non-corrupt files downloaded" , which has $p=1-0.8$......and expected number of non-corrupt files after _ trials is $\geq 1$....

EDIT: While it is true that the original question asks for a geometric random variable, one can look at the same problem from a different perspective, and still answer the question correctly. One can focus instead on whether a file is corrupt or not, and then define a new Binomial random variable to be the expect number of non-corrupt files in $n$ trials. Then this new random variable has mean $np$ for $n$ trials. We then want to find $n$ so that $np \geq 1$ , and , since $p=0.2$, we have $np \geq 1$ when $n \geq 5$.

$\endgroup$
8
  • $\begingroup$ "your random variable is 'number of non-corrupt files downloaded'"... that isn't true. The RV is the number of trials until a success. $\endgroup$
    – Tyler
    Commented Dec 13, 2013 at 6:17
  • $\begingroup$ But if you multiply the units in $E(X)=np$, you get (trials)( succeses/trial)=succeses $\endgroup$ Commented Dec 13, 2013 at 6:25
  • $\begingroup$ Actually, you can describe it as either type of random variable, depending on what you're interested in observing/studying. $\endgroup$ Commented Dec 13, 2013 at 6:32
  • $\begingroup$ $X$ isn't a binomial RV so saying $E(X) = np$ makes no sense. Of course we can associate both a binomial AND geometric RV to some bernoulli trials, but that doesn't mean they are interchangeable as distributions. $\endgroup$
    – Tyler
    Commented Dec 13, 2013 at 6:35
  • $\begingroup$ What do you mean? If I'm only interested on whether the file I downloaded is corrupt or not, then it is a binomial. If I want to know the number of trials before success, it is a geometric R.V. I can then say that the expect number of non-corrupt files in $5$ trials is $1$. It is a different approach to the problem, just as valid. $\endgroup$ Commented Dec 13, 2013 at 6:39
-1
$\begingroup$

(This answer is partially based on this answer)

I recently had some trouble understanding $E[X]=\frac1p$. This helped me understand:

$$E[X]=\sum_{i=1}^{\infty} i \cdot (1-p)^{i-1} \cdot p=p + (1-p) \cdot \left(\sum_{i=2}^{\infty} i\cdot(1-p)^{i-2}\cdot p\right)= \\ = p + (1-p)\cdot\left(\sum_{i=1}^{\infty} (i+1)\cdot(1-p)^{i-1}\cdot p\right)=p+(1-p)E[X+1] \ ; \\ E[X+1]=E[X]+1 \ ; \\ E[X]=p+(1-p)(E[X]+1) \to E[X]=p+E[X]+1-pE[X]-p \to pE[X]=1 \to E[X]=\frac1p \ .$$

$\endgroup$
2
  • $\begingroup$ This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review $\endgroup$
    – emonHR
    Commented Nov 2, 2023 at 19:48
  • $\begingroup$ While it does not answer the original question as stated, the question seemed to have become a place to show the expected value of a general random variable --- I found some of those answers to be confusing or vague, so I made my own, that (I hope) is clearer. $\endgroup$ Commented Nov 2, 2023 at 21:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .