What is the PDF for the minimum difference between a random number and a set of random numbers

Question

I have a list (lets call it $ \{L_N\} $) of N random numbers $R\in(0,1)$ (chosen from a uniform distribution). Next, I roll another random number from the same distribution (let's call this number "b"). Now I find the element in the list $ \{L_N\} $ that is the closest to the number "b" and find this distance.

If I repeat this process, I can plot the distribution of distances that are obtained through this process.

When $N\to \infty$, what does this distribution approach?

When I simulate this in Mathematica, it appears as though it approaches an exponential function. And if the list was 1 element long, then I believe this would exactly follow an exponential distribution.

Looking at the wikipedia for exponential distributions, I can see that there is some discussion on the topic:

But I'm having trouble interpreting what they are saying here. What is "k" here? Is my case what they are describing here in the limit where $n\to \infty$?

EDIT: After a very helpful helpful intuitive answer by Bayequentist, I understand now that the behavior as $N \to \infty$ should approach a dirac delta function. But I'd still like to understand why my data (which is like the minimum of a bunch of exponential distributions), appears to also be exponential. And is there a way that I can figure out what this distribution is exactly (for large but finite N)?

Here is a picture of what the such a distribution looks like for large but finite N:

EDIT2: Here's some python code to simulate these distributions:

%matplotlib inline
import math
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
numpoints = 10000
NBINS = 1000

randarray1 = np.random.random_sample((numpoints,))
randarray2 = np.random.random_sample((numpoints,))

dtbin = []

for i in range(len(t1)):
    dt = 10000000
    for j in range(len(t2)):
        delta = t1[i]-t2[j]
        if abs(delta) < abs(dt): 
            dt = delta
    dtbin.append(dt)

plt.figure()
plt.hist(dtbin, bins = NBINS)
plt.show()

The exponential may be a reasonable approximation, but your minimal distance will certainly not be truly exponentially distributed: the exponential has unbounded support, but your distance is bounded between 0 and 1. — Stephan Kolassa, Commented Dec 15, 2019 at 6:23
I'm pretty confused right now :/ Can you also share the code that you used to generate this picture? — Bayequentist, Commented Dec 15, 2019 at 6:33
Okay, I added some python code that can produce those images. — Steven Sagona, Commented Dec 15, 2019 at 7:17
That looks to me like it's going to converge to Dirac delta. Have you tried bigger sample sizes (100k, 1m, 10m...)? (make sure the range of x-axis is consistent) The only difference between my code and yours is that you allow the distances to be negative. If you take absolute value of the distances your plot will look like mine. — Bayequentist, Commented Dec 15, 2019 at 7:49
Also, the number of b’s (let’s call it nSim) doesn’t have to be as large as N! If you also make nSim = N = 1m, your code will take forever to run. You can try fixing nSim = 10k, and observe how the distribution changes shape as N goes from 10k to 100k. — Bayequentist, Commented Dec 15, 2019 at 8:03

Henry · Accepted Answer · 2019-12-15 14:19:29Z

If you had been looking for the distance to the next value above, and if you inserted an extra value at $1$ so this always had an answer, then using rotational symmetry the distribution of these distances $D$ would be the same as the distribution of the minimum of $n+1$ independent uniform random variables on $[0,1]$.

That would have $P(D \le d) = 1-(1-d)^{n+1}$ and so density $f(d)=(n+1)(1-d)^n$ when $0 \le d \le 1$. For large $n$ and small $d$ this density can be approximated by $f(d) \approx n e^{-nd}$, explaining the exponential shape you have spotted.

But your question is slightly more complicated, as you are interested in the signed distance to the nearest value above or below. As your Wikipedia link shows, the minimum of two i.i.d. exponential random variables with rate $\lambda$ is an exponential random variable with rate $2\lambda$. So you need to change the approximation to the density to reflect both the doubled rate and the possibility of negative values of $d$. The approximation actually becomes a Laplace distribution with $$f(d) \approx n e^{-2n|d|}$$ remembering this is for large $n$ and small $d$ (in particular the true density is $0$ unless $-\frac12 \lt d \lt \frac12$). As $n$ increases, this concentrates almost all the density at $0$ as in Bayequentist's response of the limit of a Dirac delta distribution

With $n=10^6$ the approximation to the density would look like this, matching the shape of your simulated data.

So if you multiply the difference by n (e.g. rescale by the appropriate amount so the expectation stays bounded away from 0 and infinity), then it does converge to a two sided exponential distribution. — shalop, Commented Dec 16, 2019 at 14:00

Bayequentist · Accepted Answer · 2019-12-15 21:43:32Z

8

When $N \to \infty$, $L_N$ contains all real numbers in $(0,1)$. Thus, the distance from any number in $(0,1)$ to the closest number in $L_N$ will approach 0 as $N \to \infty$. The distribution of distances approaches the Dirac delta distribution as $N \to \infty$.

Here are some simulations:

Here's a code snippet:

n <- 100000
Ln <- runif(n)

nSim <- 10000
distances <- rep(0,nSim)
for (i in 1:nSim){
  b <- runif(1)
  distances[i] <- min(abs(Ln-b))
}
hist(distances,main="N=100000")

edited Dec 15, 2019 at 21:43

answered Dec 15, 2019 at 6:01

Bayequentist

3801 gold badge2 silver badges15 bronze badges

$\begingroup$ Thanks for the intuitive answer! I think this answers my question about what happens when $N \to \infty$, but I'm still hoping to have an understanding of why my data looks exponential. I'll upload an edit with a picture so you can see it. $\endgroup$
– Steven Sagona
Commented Dec 15, 2019 at 6:16
$\begingroup$ Also, if my edit is a bit too much of a "moving target" for you, then I can rewrite that particular part as a separate question. I'll happily accept your answer if that's the case. $\endgroup$
– Steven Sagona
Commented Dec 15, 2019 at 6:21
3

$\begingroup$ When $N\to\infty$, $L_N$ contains all real numbers in $(0,1)$. That seems doubtful to me, since the set $\{L_1,L_2,L_3,\dots\}$ is countable, so we get $N$ as "countable infinity", while the set of "all real numbers in $(0,1)$" is uncountable. So, maybe it's better to say "contains a number that is arbitrary close to any real number in $(0, 1)$" (i.e. a dense subset of $(0, 1)$)? $\endgroup$
– trolley813
Commented Dec 16, 2019 at 7:44
1

$\begingroup$ That is correct - a subset of a countable set cannot be uncountable. This answer was never meant to be a rigorous answer anyway. I was just trying to provide a useful and easy-to-understand intuition. Henry's answer is much more complete and rigorous than mine. $\endgroup$
– Bayequentist
Commented Dec 16, 2019 at 8:07
$\begingroup$ This was my thought when I first read the question, that the answer was obviously 0. The average distance obviously gets smaller for each added element, and there's no obvious way that it can reasonably approach anything but 0 $\endgroup$
– Cruncher
Commented Dec 16, 2019 at 18:28

Add a comment |

wolfies · Accepted Answer · 2019-12-15 20:32:43Z

is there a way that I can figure out what this distribution is exactly (for large but finite N)?

The difference of two standard Uniform random variables is Triangular(-1,0,1) with pdf $1-|x|$ defined on $(-1,1)$.

Distance is the absolute value of the difference which has pdf say $f(x)$:

Repeating the exercise $n$ times and taking the minimum distance is equivalent to finding the minimum $(1^{\text{st}})$ order statistic wrt the parent pdf $f(x)$, which is given by:

where I am using the OrderStat function from the mathStatica package for Mathematica to automate the nitty gritties, and where the domain of support is (0,1). The solution has a Power Function distribution with pdf of form $g(x) = a x^{a-1}$.

The following diagram compares a plot of the exact pdf of the minimum distance just derived $g(x)$ (red dashed curve) ... to a Monte Carlo simulation (squiggly blue curve), when the sample size is $n=10$:

Simulation: As you are using Mathematica for simulation, here is the code I am using for the data simulation in Mathematica:

  data = Table[Min[Abs[RandomReal[{}, 10] - RandomReal[]]], 20000];

Acccumulation · Accepted Answer · 2019-12-15 21:21:11Z

For you to get a number larger than $d$ as your result, all numbers in your sample have to be $d$ away from $b$. The probability of that happening for any individual $x_0$ is just the probability mass outside the range $b \pm d$. Call that $p_{outside}$. The probability of that happening for all $x_i$ in your sample is $(p_{outside})^N$. If $x_i$ are chosen uniformly from the unit interval, then $p_{outside}$ for $b$ more than $d$ from the boundary will be $1-2d$, and that gives $p_{outside}^N = (1-2d)^N$. For large $N$ and small $d$, that can be approximated by $e^{-2Nd}$.

Yair Daon · Accepted Answer · 2019-12-15 16:46:35Z

Imagine you first draw the last one and denote it X. This does not change the problem formulation at all. For any $X_i \in L_N, i=1,...,N$, we know that $Y_i := |X-X_i|$ has some distribution (you may or may not want to compute this) and that $Y_i$ are iid given $X$. From Wikipedia, we know that the CDF of their minimum is $$ F_{min}(y) = 1 - [1-F_Y(y)]^N. $$

For any fixed $y$, we know $F_Y(y) > 0$ for any $y > 0$ and $F(y) = 0$ otherwise. Take $N \to \infty$ and you get a CDF which is identically one for $y > 0$ and identically zero otherwise. This is a delta function centered at zero, like all simulations above show. This holds for any $x \in (0,1)$ so the convergence holds always (albeit with varying convergence rates, perhaps).

Stack Exchange Network

What is the PDF for the minimum difference between a random number and a set of random numbers

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
probability
distributions
density-function
exponential-distribution
or ask your own question.

Hot Network Questions

What is the PDF for the minimum difference between a random number and a set of random numbers

5 Answers 5

Not the answer you're looking for? Browse other questions tagged probabilitydistributionsdensity-functionexponential-distribution or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
distributions
density-function
exponential-distribution
or ask your own question.