1
$\begingroup$

Suppose there is a sequence of$N$ numbers $x_1, x_2, x_3, ... x_N$.

There are then gaps $|x_i - x_j|$, and the minimum gap:

$\delta (N) = \text{min}_{i \ne j \le N} \{ | x_i -x_j | \}$.

Let the mean gap be normalized to $1/N$.

If the sequence of numbers $x_1, x_2, x_3, ... x_N$ is random, the minimum gap

$\delta (N) \simeq \frac{1}{N^2}$

this is the birthday problem!

How to calculate this $1/N^2$ explicitly?

Or, where is the calculation shown?

See the problem described here: https://youtu.be/n8IMb2mW6TM?t=2396

$\endgroup$

2 Answers 2

2
$\begingroup$

You can do this using a sort of continuous version of the combinatorics technique sometimes called "stars and bars".

Let's just take the $x_k$ to be $N$ independent random numbers chosen uniformly in $[0,1]$; that will (at least to a good enough approximation) ensure that the mean gap is $1/N$.

Now the probability that the minimum gap is at least $\delta$ is just $(1-(N-1)\delta)^N$. Imagine distributing within the unit interval not $N$ mere points, but the left-hand endpoints of $N$ intervals of length $\delta$. The probability we seek is the probability that no two of these intervals overlap, which we should think of as the $N$-dimensional volume of the region in $[0,1]^N$ where that's true. When this happens we can imagine shrinking all those intervals to zero, moving each $x_k$ to $x_k-m_k\delta$ where $m_k$ is the number of points to the left of $x_k$; this transforms our $N$-dimensional region, in a volume-preserving way, to precisely $[0,1-(N-1)\delta]^N$.

Now, when $N$ is large and $\delta$ is small, this is approximately $\exp(-N^2\delta)$. (Ugly details below.)

So, for instance:

  • The probability that $\delta>N^2\log N$ is at most about $\exp(-\log N)=1/N$.
  • The probability that $\delta<N^{2+\epsilon}$ is at most about $1-\exp(-N^\epsilon)\simeq N^{-\epsilon}$.
  • The median value of $\delta$ occurs when $\exp(-N^2\delta)\simeq\frac12$, i.e., when $\delta\simeq\frac{\log2}{N^2}$.
  • The expected value of $\delta$ is approximately $\int_0^1N^2\exp(-N^2\delta)\delta\,d\delta=\frac1{N^2}[1-\exp(-N^2)(N^2+1)]\simeq\frac1{N^2}$.
    • Here I'm assuming not only that $\exp(-N^2\delta)$ is a good approximation to $(1-(N-1)\delta)^N$, but also that the derivative of the former is a good approximately to the derivative of the latter. I'm pretty sure this is also true.
    • For the expected value, it's actually easier just to use the exact expression for the probability. If I've fed the formulae into Mathematica (which makes fewer mistakes than I do) correctly, what we get is exactly $1/(N^2-1)$.

Here are (kinda) the ugly details of the $\exp(-N^2\delta)$ thing, for those who care:

Let's first of all work with $f(N,\delta):=(1-N\delta)^N$ instead of $g(N,\delta):=(1-(N-1)\delta)^N$. The latter lies between $(1-N\delta)^N$ and $(1-(N-1)\delta)^{N-1}$, so if we have good bounds for $f$ then we can use them to bound $g$.

Now, rather than actually doing the calculations, I refer the reader to this other math.stackexchange.com answer and the comment below it, which uses the Taylor series for the logarithm to find that $(1-x/n)^n$ differs from $\exp(-x)$ by at most $(1-a)^{-2} \frac{x^2}{2n}\exp(-x)$ provided $x<a$. That is, $(1-N\delta)^N$ differs from $\exp(-N^2\delta)$ by at most $\frac{(1-a)^{-2}}2 N\delta \exp(-N^2\delta)$ when $N\delta<a$. (And when $N\delta>a$, both are nice and small.) This bound is plenty good enough to justify the claims above.

$\endgroup$
1
$\begingroup$

To deal with the $\frac1n$ issue and get the exact result, let's in fact choose $n$ iid uniform postions on a circle circumference $1$ (equivalent in the birthday problem to days in a year), then pick one of them as the starting point and measure the distances clockwise from there: this is equivalent to choosing $n-1$ iid uniform numbers in $(0,1)$ and sorting them into $0 \le x_{(1)} \le x_{(2)}\le \cdots \le x_{(n-1)} \le 1$, and then using $x_{(0)}=0$ and $x_{(n)}=1$ so $n$ gaps $G_i= x_{(i)}-x_{(i-1)}$.

The $n$ gaps $G_i$ are identically distributed and sum to $1$ so they each have expectation $E[G_i]=\frac1n$. We can state their distributions as $P(G_i \le y) = 1-(1-y)^{n-1}$ for $0 < y <1$ so with density $(n-1)(1-y)^{n-2}$ and expectation $\frac 1n$.

The $G_i$ are not independent of each other as they sum to $1$ and so the smallest cannot exceed $\frac1n$, but as $n$ increases the relationship between them becomes weaker. Going back to the circle analogy, the probability the gaps all exceed some $y$ is $(1-ny)^{n-1}$ since you are excluding a combined $ny$ from the circumference of $1$ for the location of the $n-1$ points, so $P(\min(G_i) \le y)=1 - (1-ny)^{n-1}$ with density $n(n-1)(1-ny)^{n-2}$ when $0 < y < \frac1n$ and expectation $E[\min(G_i)]=\frac1{n^2}$, which is what you are looking for.

There is also a convergence property for large $n$:

  • If you were to consider $nG_i$ then you would get convergence in distribution to $Exp(1)$ as $n$ increases, because $P(nG_i \le z) = 1-\left(1-\frac zn\right)^{n-1} \to 1-e^{-z}$ for $z>0$. An earlier question was similar.
  • Similarly if you were to consider $n^2\min(G_i)$, it also converges in distribution to $Exp(1)$ as $n$ increases because $P(n^2\min(G_i) \le w)= 1-\left(1-\frac wn\right)^{n-1} \to 1-e^{-w}$ for $w>0$ too.

Here is a simulation in R to confirm this expectation, using $n=10$.

mingap <- function(n){
  x <- runif(n-1)
  gap <- diff(c(0,sort(x),1))
  return(min(gap))
  }

set.seed(2024)
n <- 10
sims <- replicate(10^5, mingap(n))
mean(sims)      # expected 1/n^2
# 0.01000694

The black curve below is the empirical CDF from the simulation, with the red theoretical CDF $1 - (1-ny)^{n-1}$ almost exactly matching it and so obscuring it. The grey exponential CDF for $Exp(n^2)$ gives a fairly close approximation.

plot.ecdf(sims)
curve(1-(1-n*x)^(n-1),from=0, to=1/n, add=TRUE, col="red" ) 
curve(1-exp(-x*n^2),from=0, to=1/n, add=TRUE, col="grey" ) 

CDFs

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .