5
$\begingroup$

Consider a game in which you flip a coin until you flip tails. Your score is then the number of heads you flipped. So, for example, the sequence $H$, $H$, $H$, $T$ has a score of three, while the sequence $T$ has score zero. You play this game $n$ times. Let $X$ denote your maximum score in the game. I'd like to know what $E[X]$ looks like as a function of $n$.

Here's what I know so far. The probability that your score is greater than or equal to some number $k$ is equal to $$\left(1 - \left(1 - 2^{-k}\right)^n\right),$$ which is the probability that you don't have all scores strictly below $k$. We can then write $E[X]$ as follows:

$$\begin{aligned} E[X] & = \sum_{k = 0}^{\infty} k \cdot Pr[X = k] \\ & = \sum_{k = 1}^{\infty} Pr[X \ge k] \\ & = \sum_{k = 1}^{\infty} \left(1 - \left(1 - 2^{-k}\right)^n\right). \end{aligned}$$

At this point, I'm completely out of my league in terms of sequence manipulations, and I'm not sure how to simplify this or even interpret what this looks like.

I know that I can write

$$(1 - 2^{-k})^n = ((1 - 2^{-k})^{2^{k}})^{n 2^{-k}} \approx e^{-n 2^{-k}},$$

which lets me rewrite the sum as being (approximately)

$$\sum_{k = 1}^{\infty} (1 - e^{-n 2^{-k}}),$$

but (1) I don't know how much this messes up the estimate and (2) I still have no idea how to simplify or manipulate this sum.

I know that for $n \ll 2^k$ that the term in the sum is close to 0 and that for $n \gg 2^k$ the term in the sum is close to 1, so I'm expecting this sum to come out to something like $\alpha \log_2 n + o(\log_2 n)$ or something like that, but I'm not sure how to rigorously quantify this.

Is there a nice way to simplify this sum or otherwise get a decent approximation of it?

(Context: I'm teaching the HyperLogLog estimator and while the full analysis is way too complex to put into a lecture, I'd still like to give some intuitions as to why it works. The game discussed here is equivalent to looking at leading zeros in a random hash, and the distribution of the maximum score is useful for analyzing the expected value of a simple estimator.)

$\endgroup$
8
  • $\begingroup$ Did you try using the binomial theorem? $\endgroup$
    – Jonah
    Commented Apr 28, 2021 at 0:00
  • 1
    $\begingroup$ @DavidG.Stork Sorry if I'm missing something obvious here, but I'm not sure what you mean by that. $\endgroup$ Commented Apr 28, 2021 at 0:03
  • $\begingroup$ @Jonah Hmm, I hadn't thought to try that. I suspect that manipulating the resulting sum is way above my level of expertise in this area, unfortunately. :-( $\endgroup$ Commented Apr 28, 2021 at 0:04
  • 1
    $\begingroup$ See math.stackexchange.com/a/26214/683666 $\endgroup$
    – RobPratt
    Commented Apr 28, 2021 at 1:05
  • 1
    $\begingroup$ I did some asymptotic analysis here. $\endgroup$
    – metamorphy
    Commented Apr 28, 2021 at 4:09

4 Answers 4

3
$\begingroup$

Using @Mike Spivey's answer to this question $$E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n)$$ $$\int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E_n \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx$$ for $q=\frac 12$ we have $$\frac{n (\psi (n)+\gamma )+1}{n \log (2)}\leq E_n \leq 1 +\frac{n (\psi (n)+\gamma )+1}{n \log (2)}$$

So, almost exact $$\sum_{k = 1}^{\infty} \left(1 - \left(1 - 2^{-k}\right)^n\right)\sim \frac{n H_{n-1}+1}{n \log (2)}-\frac{1}{2}$$

For example, for $n=50$, the xact value is $5.9909779$ while the approximation gives $5.9909812$.

$\endgroup$
2
  • $\begingroup$ Technically, the constants $1,\frac{-1}{2}$ don't affect $\sim.$ It's much closer than $\sim.$ $\endgroup$ Commented Apr 30, 2021 at 2:57
  • $\begingroup$ @ThomasAndrews. Thank you $\endgroup$ Commented Apr 30, 2021 at 3:31
2
$\begingroup$

Since there is a transition in the sum around $k=\log_2n$ as the OP points out, let's define $y=\log_2n-\log_2\log n$ and $z=\log_2n+\log_2\log_2n$. We have \begin{align*} \sum_{1\le k\le y} (1 - (1 - 2^{-k})^n) &\ge (y-1)(1-(1-2^{-y})^n) \\ &= (y-1)(1-(1-\tfrac{\log_e n}n)^n) \\ &= (y-1)(1-(e^{(\log_e n)/n})^n) = (y-1)(1-\tfrac1n). \end{align*} On the other hand, using $(1-t)^n \ge 1-tn$ for $0\le t\le 1$, we also have \begin{align*} \sum_{k>z} (1 - (1 - 2^{-k})^n) &\le \sum_{k>z} n2^{-k} < n2^{1-z} = \frac2{\log_2 n}. \end{align*} Therefore the entire series is at least $y(1-\tfrac1n)-1 = \log_2n+O(\log_2\log n)$ and at most $z + \frac2{\log_2n} = \log_2n+O(\log_2\log n)$ as well.

$\endgroup$
2
  • $\begingroup$ Oh, that's beautiful. Thanks! $\endgroup$ Commented Apr 28, 2021 at 0:29
  • 1
    $\begingroup$ Note that the mathematical "trick" to this method isn't really a trick at all in hindsight: you already knew that the summand is close to $1$ when $k$ is small and close to $0$ when $k$ is large, and this answer just changed those qualitative observations to quantitative ones: how close, how small/large :) $\endgroup$ Commented Apr 28, 2021 at 0:31
2
$\begingroup$

I give an asymptotic approximation and a reference on its solution here Asymptotic equivalent of $\sum_{k=1} ^N\binom{N}{k}\frac{(-1)^{k+1}}{1-p^k}$

Plug in $p=1/2 $ in the following:

$$ \sum_{k=0}^\infty \big(1-(1-p^k)^n \big) \sim \frac{\log{n} + \gamma}{\log{(1/p)}} + 1/2, \quad 0<p<1 $$

$\endgroup$
4
  • $\begingroup$ What is the error? $\gamma$ and $\frac12$ are irrelevant for $\sim.$ Also, this doesn’t agree with my data that hints $$E(X_n)-\log_2(n)<\frac12$$ $\endgroup$ Commented Apr 29, 2021 at 20:23
  • $\begingroup$ Okay, when $p=1/2,$ I get, from data alone. that you should change $+\frac12$ to $-\frac{1}{2}.$ In particular, in my data, the difference is in the range of $0.333$ for large $n,$ and $$\frac{\lambda}{\log 2}-\frac{1}{2}\approx 0.3327.$$ $\endgroup$ Commented Apr 29, 2021 at 21:40
  • $\begingroup$ Oh, I think I see your error. The expected value you've gotten is for the maximum number of tosses, not the maximum number of heads, which over-counts by $1.$ Specifically, your sum really is $k=0$ on the last line, but the actual question starts at $k=1.$ $\endgroup$ Commented Apr 29, 2021 at 22:05
  • $\begingroup$ For what it's worth, with the $-1$ fix, the difference between your approximation and actual data is less than $1/n$ for $n=1$ to $20,000.$ $\endgroup$ Commented Apr 29, 2021 at 22:28
1
$\begingroup$

Easier approach than my first answer.

If $X_n$ is your maximum win in $n$ plays, then:

$$\begin{align}P(X_n=m)&=\left(1-\frac1{2^{m+1}}\right)^n-\left(1-\frac1{2^m}\right)^n\\ &=\sum_{i=1}^n(-1)^{i-1}\binom n i \left(\frac1{2^{mi}}-\frac1{2^{(m+1)i}}\right)\\ &=\sum_{i=1}^n(-1)^{i-1}\binom n i \left(1-\frac1{2^i}\right)\frac1{2^{mi}} \end{align}$$

So $$\begin{align}E(X_n)&=\sum_{m=0}^\infty m P(X_n=m)\\ &=\sum_{i=1}^n (-1)^{i-1}\binom n i \left(1-\frac1{2^i}\right)\sum_{m=0}^\infty \frac{m}{2^{mi}}\\ &=\sum_{i=1}^n (-1)^{i -1}\binom n i (1-2^{-i})\frac{2^{-i }}{(1-2^{-i})^2}\\ &=\sum_{i=1}^n (-1)^{i-1}\binom n i \frac{1}{2^i-1} \end{align}$$

If the coin has probability $p$ of getting heads, then the expected value is:

$$E(X_n)=\sum_{i=1}^n(-1)^{i-1}\binom n i \frac{p^i}{1-p^i}$$


Examples

$$\frac{2}{1}-\frac{1}{3}=\frac 53\tag{n=2}$$ $$\frac{3}{1}-\frac{3}{3}+\frac{1}{7}=\frac{15}7\tag{n=3}$$ $$\frac{4}1-\frac{6}{3}+\frac{4}{7}-\frac1{15}=\frac{263}{105}\tag{n=4}$$ $$\frac{5}1-\frac{10}{3}+\frac{10}7-\frac5{15}+\frac1{31}=\frac{1819}{651}\tag{n=5}$$


From the data in this table (generated by a python script using infinite precision rationals to compute $E(X_n)$ and then converting to floats,) we see that for $n$ large, it seems $E(X_{2n})\approx 1+E(X_n).$ So it would appear that $c+\log_2 n$ is a good approximation, for large $n.$ $E(X_{n})-\log_2(n)$ appears positive for $n$ and decreasing.

More computations gives me: $E(X_{10000})-\log_2(10000)\approx 0.3328.$

So $E(X_n)-\log_2 n$ seems to be decreasing for all $n$ and bounded below, unclear by what, but it appears a positive value.

I think the accepted answer has a sign wrong. It is best predicted by:

$$\log_2(n)+\frac{\lambda}{\log 2} \color{red}{\mathbf -}\frac{1}{2}$$

Because $\frac{\lambda}{\log 2}-\frac{1}{2}\approx 0.3327$ is very close to what I'm seeing in data.

$$\begin{array}{|r|r|r|} \hline n&E(X_n)&E(X_n)-\log_2(n)\\ \hline 1 & 1.000 & 1.000\\ \hline 2 & 1.667 & 0.667\\ \hline 3 & 2.143 & 0.558\\ \hline 4 & 2.505 & 0.505\\ \hline 5 & 2.794 & 0.472\\ \hline 6 & 3.035 & 0.450\\ \hline 7 & 3.241 & 0.433\\ \hline 8 & 3.421 & 0.421\\ \hline 9 & 3.581 & 0.411\\ \hline 10 & 3.726 & 0.404\\ \hline 11 & 3.857 & 0.397\\ \hline 12 & 3.977 & 0.392\\ \hline 13 & 4.088 & 0.388\\ \hline 14 & 4.191 & 0.384\\ \hline 15 & 4.287 & 0.380\\ \hline 16 & 4.377 & 0.377\\ \hline 17 & 4.462 & 0.375\\ \hline 18 & 4.542 & 0.372\\ \hline 19 & 4.618 & 0.370\\ \hline 20 & 4.690 & 0.369\\ \hline 21 & 4.759 & 0.367\\ \hline 22 & 4.825 & 0.365\\ \hline 23 & 4.887 & 0.364\\ \hline 24 & 4.948 & 0.363\\ \hline 25 & 5.005 & 0.361\\ \hline 26 & 5.061 & 0.360\\ \hline 27 & 5.114 & 0.359\\ \hline 28 & 5.166 & 0.358\\ \hline 29 & 5.215 & 0.357\\ \hline 30 & 5.264 & 0.357\\ \hline 31 & 5.310 & 0.356\\ \hline 32 & 5.355 & 0.355\\ \hline 33 & 5.399 & 0.355\\ \hline 34 & 5.441 & 0.354\\ \hline 35 & 5.483 & 0.353\\ \hline 36 & 5.523 & 0.353\\ \hline 37 & 5.562 & 0.352\\ \hline 38 & 5.600 & 0.352\\ \hline 39 & 5.637 & 0.351\\ \hline 40 & 5.673 & 0.351\\ \hline 41 & 5.708 & 0.350\\ \hline 42 & 5.742 & 0.350\\ \hline 43 & 5.776 & 0.349\\ \hline 44 & 5.809 & 0.349\\ \hline 45 & 5.841 & 0.349\\ \hline 46 & 5.872 & 0.348\\ \hline 47 & 5.903 & 0.348\\ \hline 48 & 5.933 & 0.348\\ \hline 49 & 5.962 & 0.347\\ \hline 50 & 5.991 & 0.347\\ \hline 51 & 6.019 & 0.347\\ \hline 52 & 6.047 & 0.347\\ \hline 53 & 6.074 & 0.346\\ \hline 54 & 6.101 & 0.346\\ \hline 55 & 6.127 & 0.346\\ \hline 56 & 6.153 & 0.346\\ \hline 57 & 6.178 & 0.345\\ \hline 58 & 6.203 & 0.345\\ \hline 59 & 6.228 & 0.345\\ \hline 60 & 6.252 & 0.345\\ \hline 61 & 6.275 & 0.345\\ \hline 62 & 6.299 & 0.344\\ \hline 63 & 6.321 & 0.344\\ \hline 64 & 6.344 & 0.344\\ \hline 65 & 6.366 & 0.344\\ \hline 66 & 6.388 & 0.344\\ \hline 67 & 6.410 & 0.343\\ \hline 68 & 6.431 & 0.343\\ \hline 69 & 6.452 & 0.343\\ \hline 70 & 6.472 & 0.343\\ \hline 71 & 6.493 & 0.343\\ \hline 72 & 6.513 & 0.343\\ \hline 73 & 6.532 & 0.343\\ \hline 74 & 6.552 & 0.342\\ \hline 75 & 6.571 & 0.342\\ \hline 76 & 6.590 & 0.342\\ \hline 77 & 6.609 & 0.342\\ \hline 78 & 6.627 & 0.342\\ \hline 79 & 6.646 & 0.342\\ \hline 80 & 6.664 & 0.342\\ \hline 81 & 6.681 & 0.342\\ \hline 82 & 6.699 & 0.342\\ \hline 83 & 6.716 & 0.341\\ \hline 84 & 6.734 & 0.341\\ \hline 85 & 6.751 & 0.341\\ \hline 86 & 6.767 & 0.341\\ \hline 87 & 6.784 & 0.341\\ \hline 88 & 6.800 & 0.341\\ \hline 89 & 6.817 & 0.341\\ \hline 90 & 6.833 & 0.341\\ \hline 91 & 6.848 & 0.341\\ \hline 92 & 6.864 & 0.341\\ \hline 93 & 6.880 & 0.340\\ \hline 94 & 6.895 & 0.340\\ \hline 95 & 6.910 & 0.340\\ \hline 96 & 6.925 & 0.340\\ \hline 97 & 6.940 & 0.340\\ \hline 98 & 6.955 & 0.340\\ \hline 99 & 6.969 & 0.340\\ \hline 100 & 6.984 & 0.340\\ \hline 101 & 6.998 & 0.340\\ \hline 102 & 7.012 & 0.340\\ \hline 103 & 7.026 & 0.340\\ \hline 104 & 7.040 & 0.340\\ \hline 105 & 7.054 & 0.340\\ \hline 106 & 7.067 & 0.340\\ \hline 107 & 7.081 & 0.339\\ \hline 108 & 7.094 & 0.339\\ \hline 109 & 7.108 & 0.339\\ \hline 110 & 7.121 & 0.339\\ \hline 111 & 7.134 & 0.339\\ \hline 112 & 7.147 & 0.339\\ \hline 113 & 7.159 & 0.339\\ \hline 114 & 7.172 & 0.339\\ \hline 115 & 7.184 & 0.339\\ \hline 116 & 7.197 & 0.339\\ \hline 117 & 7.209 & 0.339\\ \hline 118 & 7.221 & 0.339\\ \hline 119 & 7.234 & 0.339\\ \hline 120 & 7.246 & 0.339\\ \hline 121 & 7.258 & 0.339\\ \hline 122 & 7.269 & 0.339\\ \hline 123 & 7.281 & 0.339\\ \hline 124 & 7.293 & 0.339\\ \hline 125 & 7.304 & 0.339\\ \hline 126 & 7.316 & 0.338\\ \hline 127 & 7.327 & 0.338\\ \hline 128 & 7.338 & 0.338\\ \hline 129 & 7.350 & 0.338\\ \hline 130 & 7.361 & 0.338\\ \hline 131 & 7.372 & 0.338\\ \hline 132 & 7.383 & 0.338\\ \hline 133 & 7.393 & 0.338\\ \hline 134 & 7.404 & 0.338\\ \hline 135 & 7.415 & 0.338\\ \hline 136 & 7.426 & 0.338\\ \hline 137 & 7.436 & 0.338\\ \hline 138 & 7.446 & 0.338\\ \hline 139 & 7.457 & 0.338\\ \hline 140 & 7.467 & 0.338\\ \hline 141 & 7.477 & 0.338\\ \hline 142 & 7.488 & 0.338\\ \hline 143 & 7.498 & 0.338\\ \hline 144 & 7.508 & 0.338\\ \hline 145 & 7.518 & 0.338\\ \hline 146 & 7.528 & 0.338\\ \hline 147 & 7.537 & 0.338\\ \hline 148 & 7.547 & 0.338\\ \hline 149 & 7.557 & 0.338\\ \hline 150 & 7.566 & 0.338\\ \hline 151 & 7.576 & 0.338\\ \hline 152 & 7.585 & 0.337\\ \hline 153 & 7.595 & 0.337\\ \hline 154 & 7.604 & 0.337\\ \hline 155 & 7.614 & 0.337\\ \hline 156 & 7.623 & 0.337\\ \hline 157 & 7.632 & 0.337\\ \hline 158 & 7.641 & 0.337\\ \hline 159 & 7.650 & 0.337\\ \hline 160 & 7.659 & 0.337\\ \hline 161 & 7.668 & 0.337\\ \hline 162 & 7.677 & 0.337\\ \hline 163 & 7.686 & 0.337\\ \hline 164 & 7.695 & 0.337\\ \hline 165 & 7.703 & 0.337\\ \hline 166 & 7.712 & 0.337\\ \hline 167 & 7.721 & 0.337\\ \hline 168 & 7.729 & 0.337\\ \hline 169 & 7.738 & 0.337\\ \hline 170 & 7.746 & 0.337\\ \hline 171 & 7.755 & 0.337\\ \hline 172 & 7.763 & 0.337\\ \hline 173 & 7.772 & 0.337\\ \hline 174 & 7.780 & 0.337\\ \hline 175 & 7.788 & 0.337\\ \hline 176 & 7.796 & 0.337\\ \hline 177 & 7.804 & 0.337\\ \hline 178 & 7.813 & 0.337\\ \hline 179 & 7.821 & 0.337\\ \hline 180 & 7.829 & 0.337\\ \hline 181 & 7.837 & 0.337\\ \hline 182 & 7.844 & 0.337\\ \hline 183 & 7.852 & 0.337\\ \hline 184 & 7.860 & 0.337\\ \hline 185 & 7.868 & 0.337\\ \hline 186 & 7.876 & 0.337\\ \hline 187 & 7.883 & 0.337\\ \hline 188 & 7.891 & 0.337\\ \hline 189 & 7.899 & 0.337\\ \hline 190 & 7.906 & 0.337\\ \hline 191 & 7.914 & 0.337\\ \hline 192 & 7.921 & 0.336\\ \hline 193 & 7.929 & 0.336\\ \hline 194 & 7.936 & 0.336\\ \hline 195 & 7.944 & 0.336\\ \hline 196 & 7.951 & 0.336\\ \hline 197 & 7.958 & 0.336\\ \hline 198 & 7.966 & 0.336\\ \hline 199 & 7.973 & 0.336\\ \hline 200 & 7.980 & 0.336\\ \hline \end{array} $$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .