43
$\begingroup$

A disc contains $n$ independent uniformly random points. Each point is connected by a line segment to its nearest neighbor, forming clusters of connected points.

For example, here are $20$ random points and $7$ clusters, with an average cluster size of $\frac{20}{7}$.

enter image description here

What does the average cluster size approach as $n\to\infty$ ?

My attempt:

I made a random point generator that generates $20$ random points. The average cluster size is usually approximately $3$.

I considered what happens when we add a new random point to a large set of random points. Adding the point either causes no change in the number of clusters, or it causes the number of clusters to increase by $1$ (Edit: this is not true, as noted by @TonyK in the comments). The probability that adding a new point increases the number of clusters by $1$, is the reciprocal of the answer to my question. (Analogy: Imagine guests arriving to a party; if 25% of guests bring a bottle of wine, then the expectation of the average number of guests per bottle of wine is $4$.) But I haven't worked out this probability.

Context:

This question was inspired by the question Stars in the universe - probability of mutual nearest neighbors.

Edit: Postd on MO.

$\endgroup$
1
  • 6
    $\begingroup$ Adding a new point can actually decrease the number of clusters. For instance, consider points on the $x$-axis at $0,4,9,$ and $13$: they form two clusters, $\{0,4\}$ and $\{9,13\}$. Now add a point at $x=6$: these five points form a single cluster. $\endgroup$
    – TonyK
    Commented Jan 15 at 16:16

3 Answers 3

46
$\begingroup$

This is Stars in the universe - probability of mutual nearest neighbors in disguise. If there are $k$ pairs of mutual nearest neighbours, then there are $n-k$ edges (since $n$ edges are drawn and $k$ of them occur twice). There can’t be any cycles (unless it’s an Escher disk) because the distances would have to decrease all along the cycle. So the graph is a forest of $n-(n-k)=k$ trees. For $n\to\infty$, the fraction of points that are their nearest neighbour’s nearest neighbour goes to the probability for that to happen; $\frac kn$ is half that fraction, and the expected average cluster size is the reciprocal of that. In two dimensions, that yields

$$ 2\left(\frac43+\frac{\sqrt3}{2\pi}\right)\approx3.218\;. $$

$\endgroup$
8
  • 4
    $\begingroup$ @Dan: Most likely, yes. I just saw that the same answer was already given at MO. Perhaps that was a slightly premature crosspost for a question like this after only half a day. $\endgroup$
    – joriki
    Commented Jan 16 at 3:57
  • 2
    $\begingroup$ (+1) Amazing! I didn't expect that OP's question can be answered without knowing cluster-size distribution. $\endgroup$ Commented Jan 16 at 4:39
  • 2
    $\begingroup$ The numerical coincidence amounts to $13/6+\sqrt{3}/\pi\approx e$, with the latter only larger by about one part in 10000. $\endgroup$ Commented Jan 16 at 21:05
  • 1
    $\begingroup$ I suppose there is a possibility that there are 3 points forming a regular triangle, in which case they can form a 3-cycle. But the probability of that is infinitesimal. $\endgroup$
    – rus9384
    Commented Jan 17 at 15:51
  • 1
    $\begingroup$ @Dan: Thanks. I think this is related to some ideas I had about learning more about the degree distribution of the nearest-neighbour graph, but they involve some complicated integration and it'll probably take a while before I work them out properly. $\endgroup$
    – joriki
    Commented Jan 24 at 16:03
15
$\begingroup$

Your question is scratching the surface of the area in probability called percolation theory.

Indeed, noting that the connectivity of a random graph in OP's model is independent of the scale, we may consider $n$ independent points sampled uniformly at random from a disk of radius $\sqrt{n}$. Then, as $n \to \infty$, the thermodynamic limit of this model converges to what is termed as the nearest-neighbor continuum percolation model [1], which can be described as follows:

Limit Model. Let $X$ be a Poisson point process on $\mathbb{R}^2$ with constant intensity. For any two distinct points $x$ and $y$ in $X$, connect $x$ and $y$ if $y$ is a nearest neighbor of $x$.

Q. Can we say anything about the cluster-size distribution in this model?

To my knowledge, it seems that most of the questions regarding this distribution, including OP's one, has never been explored in the literature, and I have a hunch that it is almost impossible to answer this exactly. At least, it is known that all the clusters are finite with probability one, as proved in [1].


Edit. @joriki demonstrated that OP's question can be answered without explicitly knowing the size distribution of the connected clusters, via associating each cluster to the unique "mutual nearest-neighbor pair" contained in it.


[1] Häggström, O. and Meester, R. (1996), Nearest neighbor and hard sphere models in continuum percolation. Random Struct. Alg., 9: 295-315. https://doi.org/10.1002/(SICI)1098-2418(199610)9:3<295::AID-RSA3>3.0.CO;2-S

$\endgroup$
14
$\begingroup$

This is a community wiki answer to provide a graphical demonstration of @joriki's idea. (Feel free to expand this!)

In the image below, we generated $40$ random points. Then, from each point $x$, an arrow is drawn to its nearest neighbor $y$ along with the translucent disk of radius $\overline{xy}$ centered at $x$:

Simulation

It is clear from the picture that each graph contains exactly one "mutual arrow", visually confirming joriki's observation.

Below is the Mathematica code that is used to generate the above image:

n = 40;

(* Generate n points uniformly at random from the unit disk *)
pts = Table[Sqrt[RandomReal[]]*ReIm[Exp[2Pi*I*RandomReal[]]], {n}];

(* Construct the percolation graph *)
edges = Table[i -> Ordering[Table[If[j==i, Infinity, Norm[pts[[j]]-pts[[i]]]], {j,n}]][[1]], {i, n}]

(* Visualize! *)
Graphics[{
    {Directive[Dotted], Circle[{0, 0}, 1]},
    {FaceForm[Directive[Opacity[.1], Orange]], EdgeForm[Directive[Opacity[.5], Orange, Thickness[Thin]]], Table[Disk[pts[[edge[[1]]]], Norm[pts[[edge[[2]]]]-pts[[edge[[1]]]]]], {edge, edges}]},
    {PointSize[Large], Red, Point/@pts},
    {Thickness[Medium], Arrowheads[Medium], Table[Arrow[pts[[List@@edge]]], {edge, edges}]}
}, ImageSize->640]
Clear[n,pts];

Here is a simple JavaScript simulation intended for large $n$. (Note that it will not work on Safari.) For example, for $n=1359356$ it returns an average cluster size of $3.2174$.

Fun fact: As the number of points approaches infinity, the sum of areas of the pink discs approaches the area of the big disc (on which the points are randomly located). Here is an intuitive explanation.

$\endgroup$
0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .