65
$\begingroup$

Question: Suppose we have one hundred seats, numbered 1 through 100. We randomly select 25 of these seats. What is the expected number of selected pairs of seats that are consecutive? (To clarify: we would count two consecutive selected seats as a single pair.)

For example, if the selected seats are all consecutive (eg 1-25), then we have 24 consecutive pairs (eg 1&2, 2&3, 3&4, ..., 24&25). The probability of this happening is 75/($_{100}C_{25}$). So this contributes $24\cdot 75/(_{100}C_{25}$) to the expected number of consecutive pairs.

Motivation: I teach. Near the end of an exam, when most of the students have left, I notice that there are still many pairs of students next to each other. I want to know if the number that remain should be expected or not.

$\endgroup$
5
  • 2
    $\begingroup$ That probability is $76/_{100}C_{25}$, not $75/_{100}C_{25}$. $\endgroup$
    – TonyK
    Commented Dec 9, 2015 at 20:06
  • $\begingroup$ Thanks for the correction :-) $\endgroup$ Commented Dec 9, 2015 at 21:57
  • 13
    $\begingroup$ Without math I can tell you yes that is to be expected. They're called friends. $\endgroup$
    – corsiKa
    Commented Dec 9, 2015 at 22:39
  • 4
    $\begingroup$ + for the motivation $\endgroup$ Commented Dec 9, 2015 at 22:46
  • $\begingroup$ @corsiKa, I also think it is because if my neighbor leaves, it makes me feel "I should have been ready by now also", and therefore leave quite soon after. While it is the other way around when they still sit next to me, "Why is * not finished yet, have i missed something". And then finally when * leaves, I will go about the same time. But it is still interesting to compare how much difference there is between reality and randomly. $\endgroup$ Commented Dec 10, 2015 at 13:27

6 Answers 6

49
$\begingroup$

If you're just interested in the expectation, you can use the fact that expectation is additive to compute

  • The expected number of consecutive integers among $\{1,2\}$, plus
  • The expected number of consecutive integers among $\{2,3\}$, plus
  • ....
  • plus the expected number of consecutive integers among $\{99,100\}$.

Each of these 99 expectations is simply the probability that $n$ and $n+1$ are both chosen, which is $\frac{25}{100}\frac{24}{99}$.

So the expected number of pairs is $99\frac{25}{100}\frac{24}{99} = 6$.

$\endgroup$
9
  • $\begingroup$ Can it be argued that you also need to consider the $n-1$ as a possible pair candidate? And how would that affect expectation? $\endgroup$
    – holroy
    Commented Dec 9, 2015 at 16:18
  • 1
    $\begingroup$ @holroy: If there's an $n-1$, that pair is covered when we consider $n$ one lower. That, is the pair $\{17,18\}$ is $n$ and $n+1$ when $n=17$; the pair $\{16,17\}$ is covered by setting $n=16$, not by keeping $n$ at $17$ and saying $\{n-1,n\}$. $\endgroup$ Commented Dec 9, 2015 at 16:26
  • 1
    $\begingroup$ Because the asker is interested in the expected number when most students have left, it makes sense to generalize this from $25$ to other numbers $k<100$ and even to values of $n$ different from $100$. The answer is always $k(k-1)/n$ pairs of students. It may be worth noting this in the answer. $\endgroup$
    – msh210
    Commented Dec 9, 2015 at 22:12
  • 3
    $\begingroup$ Is independence needed to take advantage of additivity of expectation? $\endgroup$
    – Brian Tung
    Commented Dec 10, 2015 at 4:56
  • 1
    $\begingroup$ @user17625: That doesn't matter, which is the nice thing about additivity of expectations: $E(X+Y)=E(X)+E(Y)$ no matter whether $X$ and $Y$ are independent or not. $\endgroup$ Commented Dec 10, 2015 at 8:03
19
$\begingroup$

Let me present the approach proposed by Henning in another way.

We have $99$ possible pairs: $\{(1,2),(2,3),\ldots,(99,100)\}$. Let's define $X_i$ as

$$ X_i = \left \{ \begin{array}{ll} 1 & i\text{-th pair is chosen}\\ 0 & \text{otherwise}\\ \end{array} \right . $$

The $i$-th pair is denoted as $(i, i+1)$. That pair is chosen when the integer $i$ is chosen, with probability $25/100$, and the integer $i+1$ is also chosen, with probability $24/99$. Then,

$$E[X_i] = P(X_i = 1) = \frac{25}{100}\frac{24}{99},$$

and this holds for $i = 1,2,\ldots,99$. The total number of chosen pairs is then given as

$$X = X_1 + X_2 + \ldots + X_{99},$$

and using the linearity of the expectation, we get

\begin{align} E[X] &= E[X_1] + E[X_2] + \cdots +E[X_{99}]\\ &= 99E[X_1]\\ &= 99\frac{25}{100}\frac{24}{99} = 6 \end{align}

$\endgroup$
12
$\begingroup$

Henning Malcolm has already answered the question about the expected value. But that says very little about how likely or unlikely it would be to get deviations from the expected value - what if you observed there were 20 pairs out of 25 students left, could that happen by chance or is there some other factor at work? To answer questions like this one would need more details about the distribution, like the variance. Unfortunately, the analytic formula for the distribution (if there is one) is likely to be very complicated. So, from a practical perspective it might be interesting to investigate things numerically. That is the purpose of this post.

Here's a histogram of the number of pairs, generated numerically from 10000 trials.

seating pairs histogram

You can look at this and draw your own conclusions. I would conclude the following:

  • anywhere from 4 to 8 pairs would be totally reasonable,
  • 0 pairs or 13+ pairs would be very strong evidence of there being another factor
  • anything inbetween would be unusual, but possible

Here's the Matlab code used to generate it:

n=100; 
k=25; 

num_trials = 1e4;
num_pairs = zeros(num_trials,1);
for ii=1:num_trials
    disp(ii)
    filled_seats = sort(randperm(n,k));
    gaps = filled_seats(2:end) - filled_seats(1:end-1);
    num_pairs(ii) = length(find(gaps == 1));
end

histogram(num_pairs)
title(sprintf('histogram of seating pairs (out of %d trials)',num_trials))
$\endgroup$
6
  • $\begingroup$ Can you run the same simulations for number of maximal clumps of size at least 2 and compare variances and ratio of variances to the means (to see deviations from Poisson distribution)? $\endgroup$
    – A.S.
    Commented Dec 10, 2015 at 10:27
  • 2
    $\begingroup$ +1 for the most useful answer. I think what OP really wants to know is: given that I see $x$ pairs, what is the probability of this happening by chance? (either this or a p-value version, i.e. $x$ or more). $\endgroup$
    – Bitwise
    Commented Dec 10, 2015 at 14:19
  • $\begingroup$ @A.S. I think that would require a lot more work then just re-running the code with minor changes. If you want, feel free to use my code as a starting point to do that; I release it to the public domain. $\endgroup$
    – Nick Alger
    Commented Dec 10, 2015 at 17:28
  • $\begingroup$ I don't have access to computing environment atm. Here is code based on yours: empty_seats=[0,sort(randperm(n,n-k)),n+1]; gaps = empty_seats(2:end) - empty_seats(1:end-1); num_clumps(ii)=length(find(gaps>1)) $\endgroup$
    – A.S.
    Commented Dec 10, 2015 at 17:48
  • $\begingroup$ @A.S. Ok, I think (?) I changed it according to your intentions. Here is the clumps of empty seats histogram, i.imgur.com/YmXccl2.png?1 and here is the code, pastebin.com/3YujFZs9 Since this seems to be more related to your answer than mine, maybe it would be better if you post it there? Also, even if you don't have matlab available, this code will run in octave online ( octave-online.net ), though it is slower and the plotting doesn't work. $\endgroup$
    – Nick Alger
    Commented Dec 10, 2015 at 18:45
7
$\begingroup$

It is not difficult to construct the bivariate generating function $G(z,u)$ of $25$-tuples by the maximum element (variable $z$) and the number of adjacent elements (variable $u$.) To do this we must choose the first element:

$$\frac{z}{1-z}$$

and then choose $24$ gaps between elements with the gap of value one marked with $u:$

$$\left(uz + \frac{z^2}{1-z}\right)^{24}.$$

The product of these is $G(z,u)$ but we are interested in the maximum being some value at most $100$ so we get

$$[z^{100}] \frac{1}{1-z} G(z, u) = [z^{100}] \frac{1}{1-z} \frac{z}{1-z} \left(uz + \frac{z^2}{1-z}\right)^{24}.$$

We need the total count of the number of adjacent pairs so we compute $$[z^{100}] \left.\frac{\partial}{\partial u} \frac{1}{1-z} G(z, u) \right|_{u=1} =[z^{100}] \left. \frac{z}{(1-z)^2} \times 24 \left(uz + \frac{z^2}{1-z}\right)^{23} \times z\right|_{u=1} \\ = [z^{100}] \frac{z}{(1-z)^2} \times 24 \left(\frac{z}{1-z}\right)^{23} \times z \\ = 24 [z^{100}] \frac{z^{25}}{(1-z)^{25}} = 24 [z^{75}] \frac{1}{(1-z)^{25}} = 24 {75+24\choose 24}.$$

This yields a final answer of $$ 24 {99\choose 24} \times {100\choose 25}^{-1} = 6.$$

$\endgroup$
1
  • $\begingroup$ Can you compute expected number (or full distribution) of student "clumps" of size at least $i$ using this method? A clump is a maximal group of adjacent students. Assume $i=2$ for simplicity if needed. $\endgroup$
    – A.S.
    Commented Dec 8, 2015 at 23:18
5
$\begingroup$

Suppose there are $N\gg 10$ seats (in a circle) and density of students is $p$ (not too large, not too small: $Np\gg r$). Then probability of $r\ll N$ seats next to each other being occupied is $\approx p^r$ - that is average density of such runs. However these runs clump together and a fresh run is followed by approximately $\frac p {1-p}$ more students for a total of $1+\frac p{1-p}=\frac {1}{1-p}$ runs clumped together. This yields density of maximal "clumps" of size at least $r$:

$$d={p^r}(1-p)$$

For $p=0.25, r=2$ we get $4.7\%$. Using exact value for density of runs of length $2$ ($6$) we'd get $4.5\%$ instead. Unlike runs - which clump together - clumps repel and hence their number has a smaller variance and is better for quick comparison with the mean value.

You can do the same computation for $r=1$ to get density of maximal clumps of size at least $1$ $p(1-p)=0.19$.

$\endgroup$
5
$\begingroup$

Suppose we ask about the expected maximum run length when we select $m$ elements from a total of $n.$

For a run length at most $k$ we must first choose a run of some length possibly preceded by a gap, giving (the variable $w$ counts the number of elements in the tuple)

$$\frac{1}{1-z} \sum_{q=1}^k w^q z^q $$

followed by a sequence of gaps followed by a run $$\frac{z}{1-z} \sum_{q=1}^k w^q z^q$$

possibly followed at the end by another gap $$\frac{1}{1-z}.$$

This gives the generating function $$f_k(z,w) = \frac{1}{(1-z)^2} \left(\sum_{q=1}^k w^q z^q \right) \sum_{p\ge 0} \left( \frac{z}{1-z} \sum_{q=1}^k w^q z^q \right)^p.$$

We are interested in the total count from

$$[z^n] [w^m] \sum_{k=1}^m k (f_k(z,w) - f_{k-1}(z, w)) \\ = [z^n] [w^m] \left(m f_m(z,w) - \sum_{k=1}^{m-1} f_k(z, w)\right).$$

The sum component in $f_k(z, w)$ is $$\sum_{p\ge 0} \left( \frac{wz^2}{1-z} \sum_{q=0}^{k-1} w^q z^q \right)^p \\ = \frac{1}{1-wz^2(1-w^k z^k)/(1-z)/(1-wz)}.$$

At this point it appears that we are better off working with

$$g_k(z,w) = \frac{1}{(1-wz)^2} \left(\sum_{q=1}^k z^q \right) \sum_{p\ge 0} \left( \frac{wz}{1-wz} \sum_{q=1}^k z^q \right)^p$$

which gives for the sum component $$\sum_{p\ge 0} \left( \frac{wz^2}{1-wz} \sum_{q=0}^{k-1} z^q \right)^p \\ = \frac{1}{1-wz^2(1-z^k)/(1-wz)/(1-z)}.$$

This yields $$g_k(z,w) = \frac{1}{1-wz} \left(\sum_{q=1}^k z^q \right) \frac{1}{1-wz-wz^2(1-z^k)/(1-z)} \\ = \frac{1}{1-wz} \left(\sum_{q=1}^k z^q \right) \frac{1}{1-w(z-z^2+z^2(1-z^k))/(1-z)} \\ = \frac{1}{1-wz} \left(\sum_{q=1}^k z^q \right) \frac{1}{1-w(z-z^{k+2}))/(1-z)}.$$

Extracting the coefficient on $[w^{n-m}]$ we get

$$\left(\sum_{q=1}^k z^q \right) \sum_{p=0}^{n-m} z^{n-m-p} \frac{z^p(1-z^{k+1})^p}{(1-z)^p} \\ = z^{n-m} \left(\sum_{q=1}^k z^q \right) \sum_{p=0}^{n-m} \frac{(1-z^{k+1})^p}{(1-z)^p} \\ = z^{n-m} \left(\left(\frac{1-z^{k+1}}{1-z}\right)^{n-m+1}-1\right).$$

Extracting the coefficient on $[z^n]$ we obtain $$[z^n] z^{n-m} \left(\frac{1-z^{k+1}}{1-z}\right)^{n-m+1} = [z^m] \left(\frac{1-z^{k+1}}{1-z}\right)^{n-m+1}.$$

This is $$\sum_{p=0}^{\lfloor m/(k+1)\rfloor} {n-m+1\choose p} (-1)^p {n-m+m-p(k+1)\choose n-m} \\ = \sum_{p=0}^{\lfloor m/(k+1)\rfloor} {n-m+1\choose p} (-1)^p {n-p(k+1)\choose n-m}.$$

Substitute this into the summation formula to obtain $$m {n\choose m} - \sum_{k=1}^{m-1} \sum_{p=0}^{\lfloor m/(k+1)\rfloor} {n-m+1\choose p} (-1)^p {n-p(k+1)\choose n-m} \\ = {n\choose m} - \sum_{k=1}^{m-1} \sum_{p=1}^{\lfloor m/(k+1)\rfloor} {n-m+1\choose p} (-1)^p {n-p(k+1)\choose n-m}.$$

This is $${n\choose m} + \sum_{p=1}^m {n-m+1\choose p} (-1)^p {n-p\choose n-m} \\ - \sum_{k=0}^{m-1} \sum_{p=1}^{\lfloor m/(k+1)\rfloor} {n-m+1\choose p} (-1)^p {n-p(k+1)\choose n-m} \\ = {n\choose m} + \sum_{p=1}^m {n-m+1\choose p} (-1)^p {n-p\choose n-m} \\ - \sum_{1\le pq \le m} {n-m+1\choose p} (-1)^p {n-pq\choose n-m}.$$

Now we do one last simplification on the first and the second term together which are $$\sum_{p=0}^m {n-m+1\choose p} (-1)^p {n-p\choose n-m}$$

and introduce for use with the Egorychev method $${n-p\choose n-m} = {n-p\choose m-p} = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{m-p+1}} (1+z)^{n-p} \; dz.$$

Note that this is zero when $p\gt m$ so we may extend $p$ to infinity, getting $$\frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{m+1}} (1+z)^{n} \sum_{p\ge 0} {n-m+1\choose p} (-1)^p \frac{z^p}{(1+z)^p} \; dz \\ = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{m+1}} (1+z)^{n} \left(1-\frac{z}{1+z}\right)^{n-m+1} \; dz \\ = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{m+1}} (1+z)^{n} \frac{1}{(1+z)^{n-m+1}} \; dz \\ = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{m+1}} (1+z)^{m-1} \; dz = [z^m] (1+z)^{m-1} = 0.$$

Dividing by ${n\choose m}$ for the expectation and without the zero contribution from the first two terms we thus obtain for the expectation of the maximum run length

$${\large\color{#080}{{n\choose m}^{-1} \sum_{1\le pq \le m} {n-m+1\choose p} (-1)^{p+1} {n-pq\choose n-m}}}.$$

The reader is invited to prove this in a different way, perhaps without using ordinary generating functions.

The Maple code that was used to verify these including a total enumeration routine was as follows:

with(combinat);

sr :=
proc(n, m)
    local sel, sel2, maxr, cur, res, pos;

    sel := firstcomb(n, m);
    res := 0;

    while type(sel, `set`) do
        sel2 := convert(sel, `list`);

        maxr := -1; cur := 1;

        for pos from 2 to m do
            if sel2[pos] = 1 + sel2[pos-1] then
                cur := cur + 1;
            else
                if cur > maxr then
                    maxr := cur;
                fi;
                cur := 1;
            fi;
        od;

        if cur > maxr then
            maxr := cur;
        fi;

        res := res + maxr;
        sel := nextcomb(sel, n);
    od;

    res;
end;

f :=
proc(k)
    local base;

    base := add(w^q*z^q, q=1..k);

    1/(1-z)^2*base*sum((z/(1-z)*base)^p, p=0..infinity);
end;

Hf :=
proc(m)
    m*f(m)-add(f(k), k=1..m-1);
end;

Qf :=
proc(n, m)
    coeftayl(coeftayl(Hf(m), w=0, m), z=0, n);
end;

g :=
proc(k)
    local base;

    base := add(z^q, q=1..k);

    1/(1-w*z)^2*base*sum((w*z/(1-w*z)*base)^p, p=0..infinity);
end;

Hg :=
proc(m)
    m*g(m)-add(g(k), k=1..m-1);
end;

Qg :=
proc(n, m)
    coeftayl(coeftayl(Hg(m), w=0, n-m), z=0, n);
end;

X :=
proc(n, m)
    local CF;

    CF := (n,m,k)->
    add(binomial(n-m+1,p)*(-1)^p*binomial(n-p*(k+1), n-m),
        p=0..floor(m/(k+1)));

    m*binomial(n,m) - add(CF(n,m, k), k=1..m-1);
end;

X1 :=
proc(n, m)
    local p, q, res;

    res := binomial(n,m) +
    add(binomial(n-m+1,p)*(-1)^p*binomial(n-p,n-m), p=1..m);

    for p to m do
        for q to floor(m/p) do
            res := res -
            binomial(n-m+1,p)*(-1)^p*binomial(n-p*q,n-m);
        od;
    od;

    res;
end;

X2 :=
proc(n, m)
    local p, q, res;

    res := 0;

    for p to m do
        for q to floor(m/p) do
            res := res +
            binomial(n-m+1,p)*(-1)^(p+1)*binomial(n-p*q,n-m);
        od;
    od;

    res;
end;

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .