45
$\begingroup$

Like all combinatoric problems, this one is probably equivalent to another, well-known one, but I haven't managed to find such an equivalent problem (and OEIS didn't help), so I offer this one as being possibly new and possibly interesting.

Problem statement

I have $2N$ socks in a laundry basket, and I am hanging them on the hot pipes to dry. To make life easier later, I want to hang them in pairs. Since it is dark where the pipes are, I adopt the following algorithm:

  1. Take a sock at random from the basket.
  2. If it matches one that is already on my arm, hang them both on the pipes: the one in my hand and the matching one taken from my arm.
  3. If it does not match one that is already on my arm, hang it on my arm with the others.
  4. Do this $2N$ times.

The question is: How long does my arm have to be?

Clearly, the minimum length is $1$, for instance if the socks come out in the order $AABBCC$. Equally clearly, the maximum length is $N$, for instance if the socks come out as $ABCABC$. But what is the likeliest length? Or the average length? Or what sort of distribution do the required lengths have?

It turns out to be easiest to parameterise the results not by $2N$, the number of socks, but by $2N-1$, which I will call $M$.

The first few results

(Notation: $n!!$ is the semifactorial, the factorial including only odd numbers; thus $7!!=7\times 5\times 3\times 1$).

In each case I provide the frequency for each possible arm length, starting with a length of 1. I use frequencies rather than probabilities because they are easier to type, but you can get the probabilities by dividing by $M!!$.

$$ \begin{array}{c|rrrrr} M \\ \hline 1 & 1 \\ 3 & 1 & 2 \\ 5 & 1 & 8 & 6 \\ 7 & 1 & 30 & 50 & 24 \\ 9 & 1 & 148 & 340 & 336 & 120 \\ \end{array} $$ It would be good to know (for example) if these frequencies tend to some sort of known distribution as $M\to\infty$, just as the binomial coefficients do.

But, as I said at the beginning, this may just be a re-encoding of a known combinatorial problem, carrying a lot of previously worked out results along with it. I thought, for instance, of the lengths of random walks in $N$ dimensions with only one step forward and one step back being allowed in each dimension – but that looked too complicated to give any straightforward direction to follow.

Background: methods

In case it is interesting or helpful, I obtained the results above by means of a two-dimensional generating function, in which the coefficient of $y^n$ identified the arm length needed and the coefficient of $x^n$ identified how many socks had been retrieved at the [first] time that this length was reached. Calling the resulting generating function $A_M(x,y)$, the recurrence I used was:

$$A_M=MxyA_{M-2}+x^2(x-y)\frac\partial{\partial x}A_{M-2}+(1-x^2)xy$$

which is based on sound first principles and matches the results of manual calculation up to $M=5$. Having found a polynomial, I substitute $x=1$ and the numbers in the table above are then the coefficients of the powers of $y$.

But, mathematics being close to comedy, all this elaboration may be an unnecessarily complicated way to get to a result too trivial to be found even in OEIS. Is it?

$\endgroup$

3 Answers 3

24
$\begingroup$

I did some Monte Carlo with this interesting problem and came to some interesting conclusions. If you have $N$ pairs of socks the expected maximum arm length is slightly above $N/2$.

First, I made 1,000,000 experiments with 100 pairs of socks and recorded maximum arm length reached in each one. For example, maximum arm length of 54 was reached about 90,000 times. And it all looks like a normal distribution to me. The average value of maximum arm length was 53.91, confirmed several times in a row.

enter image description here

Nothing changed with 100 pairs of socks and 10,000,000 experiments. Average value remained the same. So it looks like you need about a million runs to draw up a meaningful conclusion.

enter image description here

Here is what I got when I doubled the number of socks to 200 pairs. Maximum arm length on average was 105.12, still above 50%. I got the same value in several repeated experiments ($\pm0.01$).

enter image description here

Finally, I decided to check expected maximum arm length for different number of sock pairs, from 10 to 250. Each number of pairs was tested 2,000,000 times before the average value was calculated. Here are the results:

$$ \begin{array}{c|rr} \textbf{Pairs} & \textbf{Arm Length} & \textbf{Increment} \\ \hline 10 & 6.49 & \\ 20 & 12.03 & 5.54 \\ 30 & 17.41 & 5.38 \\ 40 & 22.71 & 5.30 \\ 50 & 27.97 & 5.26 \\ 60 & 33.20 & 5.23 \\ 70 & 38.40 & 5.20 \\ 80 & 43.59 & 5.19 \\ 90 & 48.75 & 5.16 \\ 100 & 53.91 & 5.16 \\ 110 & 59.07 & 5.16 \\ 120 & 64.20 & 5.13 \\ 130 & 69.33 & 5.13 \\ 140 & 74.46 & 5.13 \\ 150 & 79.58 & 5.12 \\ 160 & 84.69 & 5.11 \\ 170 & 89.80 & 5.11 \\ 180 & 94.91 & 5.11 \\ 190 & 100.02 & 5.11 \\ 200 & 105.11 & 5.09 \\ 210 & 110.20 & 5.09 \\ 220 & 115.29 & 5.09 \\ 230 & 120.38 & 5.09 \\ 240 & 125.47 & 5.09 \\ 250 & 130.56 & 5.09 \end{array} $$

enter image description here

It looks like a straight line but it's actually an arc, slightly bended downwards (take a look at the increment column).

Finally, here is the Java code that I used for my experiments.

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class Basket {
    public static final int PAIRS = 250;
    public static final int NUM_EXPERIMENTS = 2_000_000;    

    int n;
    List<Integer> basket;
    Set<Integer> arm;

    public Basket(int n) {
        // basket size
        this.n = n;
        // socks are here
        this.basket = new ArrayList<Integer>();
        // arm is just a set of different socks
        this.arm = new HashSet<Integer>();
        // add a pair of same socks to the basket
        for(int i = 0; i < n; i++) {
            basket.add(i);
            basket.add(i);
        }
        // shuffle the basket
        Collections.shuffle(basket);
    }

    // returns maximum arm length
    int hangSocks() {
        // maximum arm length
        int maxArmLength = 0;
        // we have to hang all socks
        for(int i = 0; i < 2 * n; i++) {
            // take one sock from the basket
            int sock = basket.get(i);
            // if the sock of the same color is already on your arm...
            if(arm.contains(sock)) {
                // ...remove sock from your arm and put the pair over the hot pipe
                arm.remove(sock);
            }
            else {
                // put the sock on your arm
                arm.add(sock);
                // update maximum arm length
                maxArmLength = Math.max(maxArmLength, arm.size());
            }
        }
        return maxArmLength;
    }

    public static void main(String[] args) {
        // results of our experiments will be stored here
        int[] results = new int[PAIRS + 1];
        // run millions of experiments
        for(int i = 0; i < NUM_EXPERIMENTS; i++) {
            Basket b = new Basket(PAIRS);
            // arm length in a single experiment
            int length = b.hangSocks();
            // remember how often this result appeared
            results[length]++;
        }
        // print results in CSV format so that we can plot them in Excel
        for(int i = 0; i < results.length; i++) {
            System.out.println(i + "," + results[i]);
        }
        // find average arm length
        int sum = 0;
        for(int i = 0; i < results.length; i++) {
            sum += i * results[i];
        }
        double average = (double) sum / (double) NUM_EXPERIMENTS;
        System.out.println(String.format("Average arm length is %.2f", average)); 
    }

}

EDIT: For N=500, the average value of maximum arm length after 2,000,000 tests is 257.19. For N=1000, the result is 509.23.

It seems that for $N\to\infty$, the result goes down to $N/2$. I don't know how to prove this.

$\endgroup$
3
  • $\begingroup$ Splendid. There is just one snag (or one advantage!): these figures contradict mine quite strongly. I can't verifiably take my recurrence as far up as your numbers, so I wonder: could you let me know what (a) your average maximum arm length and (b) your commonest maximum arm length are for each of 1 to 9 pairs? With any luck we will get a clear enough and noticeable enough deviation for me to identify where I've been going wrong. I am only asking because you have already written the program and plugging in some small numbers shouldn't be any work. $\endgroup$ Commented May 21, 2019 at 17:57
  • $\begingroup$ @MartinKochanski I'll get back to you tomorrow. $\endgroup$
    – Saša
    Commented May 21, 2019 at 19:45
  • $\begingroup$ Thanks: I have just done some more working out, and I can see where my recurrence went wrong. So don't bother with any more simulations after all. $\endgroup$ Commented May 21, 2019 at 20:01
14
$\begingroup$

The expected number of single socks is maximized when you are halfway through. When you have drawn $N$ socks the chance that a given pair has one on your arm is $\frac {2N^2}{2N^2+2N(N-1)}=\frac{N^2}{2N^2-N}\approx \frac 12+\frac 1{2N}$. If we make the socks distinguishable, to have one on your arm sock $1$ of a pair has $2N$ positions it can be in, then sock $2$ has $N$ choices-to be in the other half of the run. To not have one on your arm sock $1$ again has $2N$ choices but sock $2$ has only $N-1$ as it must be in the same half of the run. This says the expected number on your arm is $\frac {N^2}{2N-1}\approx \frac {N+1}2$.

The expected value being below the mode of Oldboy's distributions says that the distribution is not symmetric around the mode.

Note that this addresses the expected maximum at a given point. The expected maximum over a distribution can be higher as Empy2 explains.

$\endgroup$
7
  • $\begingroup$ Could you please explain why when you have drawn $N$ socks the chance that a given pair has one on your arm is $\frac {N^2}{N^2+N(N-1)}$? $\endgroup$
    – Hans
    Commented May 23, 2019 at 5:47
  • $\begingroup$ @Hans: I said the expected value of the number of socks on your arm was that. I computed the chance you have one of a particular pair on your arm, then multiplied by $N$ using the linearity of expectation to get the total expectation. $\endgroup$ Commented May 23, 2019 at 14:36
  • $\begingroup$ I am confused. I just copied your claim: "When you have drawn $N$ socks the chance that a given pair has one on your arm is $\frac {N^2}{N^2+N(N-1)}$." Are you saying you do not mean this expression to be a probability but an expectation? If you do mean this to be a probability, I do not understand your rationale for this expression. In fact, I got a different expression. Could you please derive this expression of the probability? $\endgroup$
    – Hans
    Commented May 23, 2019 at 17:51
  • $\begingroup$ @Hans: For one pair, the probability and the expectation are the same because you either have one sock on your arm or not. I justify the probability in the next sentence, though I left out a factor $2$. There are $2N^2$ ways to choose the positions of the socks of one pair so you have one on your arm at the midpoint and $2N(N-1)$ ways to not have one on your arm. $\endgroup$ Commented May 23, 2019 at 19:35
  • $\begingroup$ You are right. +1. I took the individual socks of a pair as indistinguishable which was wrong. $\endgroup$
    – Hans
    Commented May 25, 2019 at 4:22
8
$\begingroup$

The extra bits, above $N/2$ in Oldboy's table, are near $\sqrt[3]{N}$. I have some ideas why that might be true.

First, the expected number of socks on the arm at the $N+x$th sock is $(N^2-x^2)/(2N-1)$.

Near the $N$th sock, the number of socks on the arm follows a random walk. It is symmetric at the $N$th sock, but has a negative bias of $x^2/2N$ at the $N+x$th sock.
Around the $N+yN^{2/3}$th sock, a symmetric random walk would have moved $O(\sqrt{yN^{2/3}})$, but the negative bias is also $O((yN^{2/3})^2/N)$, so both are $O(N^{1/3})$. The negative bias will dominate for large $y$, so the maximum value will be in this domain. The random variation dominates for small $y$.

So the maximum is likely to be $N/2+O(N^{1/3})$.

EDIT: the variance in the number of socks on the arm at the $N$th sock is $$\frac{2N^2(N-1)^2}{(2N-1)^2(2N-3)}\approx\frac N4$$ So the width of the bell curve in Oldboy's graphs is roughly $\sqrt{N}$. But this effect is symmetric above and below the mean $N^2/(2N-1)$. The maximum of the random walk is not symmetric, and shifts the bell curve to the right, but that effect $(O(N^{1/3}))$ is smaller than the variation from one laundry basket to the next $(O(N^{1/2}))$

$\endgroup$
4
  • $\begingroup$ +1 Excellent obeservation and it fits my data almost perfectly. $\endgroup$
    – Saša
    Commented May 22, 2019 at 6:59
  • $\begingroup$ The denominator of the variance approximation of the number of socks on the arm should be $4$ not $2$. $\endgroup$
    – Hans
    Commented May 26, 2019 at 22:45
  • $\begingroup$ 1) By the random walk of the number of socks on the arm "has a negative bias", do you simply mean the difference between the mean and $\frac N2$ which is $-\frac{x^2}{2N-1}$? 2) Could you please explain your statement "a symmetric random walk would have moved $O(\sqrt{yN^{2/3}})$"? Should the standard deviation be instead $\sqrt{N+yN^{2/3}}$? $\endgroup$
    – Hans
    Commented May 27, 2019 at 3:20
  • $\begingroup$ I meant that the middle, at $N$, is expected to be the highest point, butas you wander away from there the is a random walk that goes above and below, so the maximum is likely to be above the centre point. It will probably only get $O(N^{1/3})$ above before the mean drags it back below the centre point $\endgroup$
    – Empy2
    Commented May 27, 2019 at 3:32

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .