20
$\begingroup$

On a circle, choose $6n$ $(n\in\mathbb{Z^+})$ uniformly random points and label them $a_0,a_1,a_2,\dots,a_{6n-1}$ going anticlockwise, with $a_0$ chosen randomly.

Draw three chords:

  • Chord $a_0 a_{3n}$
  • Chord $a_n a_{4n}$
  • Chord $a_{2n} a_{5n}$

Here is an example with $n=5$. The rightmost point is $a_0$. The centre of the circle is shown.

enter image description here

Let $P(n)=$ probability that the triangle formed by the three chords contains the centre of the circle.

What is $\lim\limits_{n\to\infty}P(n)$ ?

Intuition

When I came up with this question, I had no intuition as to whether $P(n)$ should increase or decrease, as $n$ increases.

Intuitively, as $n$ increases, the expected area of the triangle should decrease, which tends to make $P(n)$ decrease. But at the same time, the triangle should "get closer" to the circle's centre (e.g. the expectation of the distance between the triangle's centroid and the circle's centre, should decrease), which tends to make $P(n$) increase. It was not clear to me which of these opposing factors should dominate.

Simulations

Simulations, with $5\times10^6$ sets of three chords for each value of $n$, yielded the following estimated probabilities:

$P(1)\approx 0.0625\space$ (I prove that $P(1)=\frac{1}{16}$ below.)
$P(2)\approx 0.0741\overset{?}{=}\frac{2}{27}$
$P(3)\approx 0.0786$
$P(4)\approx 0.0809$
$P(5)\approx 0.0824$
$P(6)\approx 0.0832$
$P(10)\approx 0.0849$

I also got $P(100)\approx 0.0872$ based on a simulation with $10^6$ sets of three chords. I tried to get an approximation for $P(1000)$, but my computer started overheating (I'm using Excel for my simulations).

Here is a plot of estimated $P(n)$ against $n$.

enter image description here

So as $n$ increases, it seems that $P(n)$ is approaching some number strictly between $0$ and $1$.

Due to the naturalness of my question's geometric construction, I suspect that $\lim\limits_{n\to\infty}P(n)$ has a closed form, and it should be an interesting number worth finding. I also suspect that $P(n)$ is a rational sequence.

My attempt

I have only been able to prove that $P(1)=\frac{1}{16}$. Here is my proof.

Suppose that:

  • When we choose the six random points on the circle, instead of having a continuous distribution of points to choose from, we have $2k$ evenly spaced points to choose from, where $k$ is a large integer.
  • The circumference of the circle is $2k$.
  • A point can be chosen more than once.

Let $x=$ distance from $a_0$ to $a_3$ along the circle going anticlockwise.
Let $y=$ distance from $a_2$ to $a_5$ along the circle going anticlockwise.
Let $z=$ distance from $a_1$ to $a_4$ along the circle going anticlockwise.

If the triangle contains the centre of the circle, then either

  • $x\le k$ and $y\le k$ and $z\ge k$, or
  • $x\ge k$ and $y\ge k$ and $z\le k$.

These two configurations are shown respectively below, where $a_0$ is the rightmost point in each diagram.

enter image description here

By symmetry, each of these happens with equal probability, so we have:

$$P(1)=2\times P(x\le k \land y\le k\land z\ge k)$$

To see these distances more clearly, imagine cutting the circle at $a_0$ and straightening the circle into a line segment, so that the points from left to right are $a_0, a_1, a_2, a_3, a_4, a_5$.

enter image description here

To understand the next part, it may be easier to first work with specific numbers, so let's suppose $k=100$, and suppose $\color{red}{x=10}$. Then $90\le y \le 100$.

  • If $y=90$, there is only $\color{blue}{1}$ possible combination of locations for $a_1$ and $a_4$ (because we require $z\ge 100$).
  • If $y=91$ then there are $(1)+(1+2)=\color{blue}{4}$ possible combinations of locations for $a_1$ and $a_4$.
  • If $y=92$ then there are $(1)+(1+2)+(1+2+3)=\color{blue}{10}$ possible combinations of locations for $a_1$ and $a_4$.
  • $\cdots$
  • If $y=100$ then there are $(1)+(1+2)+(1+2+3)+\cdots+(1+2+3+\cdots+11)=\color{blue}{286}$ possible combinations for $a_1$ and $a_4$.

So for $\color{red}{x=10}$, the total number of possible combinations for $a_1$ and $a_4$ is $\color{blue}{1+4+10+\dots+286}=1001$.

The numbers $1,4,10,\dots,286$ are the tetrahedral numbers. The sum of the first $x+1$ tetrahedral numbers is $\binom{x+4}{4}$. So the total number of combinations of $a_1, a_2, a_3, a_4, a_5$ that satisfy $(x\le k \land y\le k\land z\ge k)$ is $\sum\limits_{x=0}^{k}\binom{x+4}{4}$.

So the probability that $(x\le k \land y\le k\land z\ge k)$ is $\sum\limits_{x=0}^{k}\binom{x+4}{4}$ divided by the total number of ways to choose $a_1, a_2, a_3, a_4, a_5$, which is $\binom{2k}{5}$.

To change from a discrete back to a continuous distribution of available points along the circle, we take the limit as $k\to\infty$.

Remembering that $P(1)=2\times P(x\le k \land y\le k\land z\ge k)$, we have:

$\begin{align} P(1)&=2\lim\limits_{k\to\infty}\frac{\sum\limits_{x=0}^k\binom{x+4}{4}}{\binom{2k}{5}}\\ &=2\lim\limits_{k\to\infty}\frac{\sum\limits_{x=0}^k\frac{x^4}{4!}}{\frac{2^5}{5!}k^5}\\ &=2\lim\limits_{k\to\infty}\frac{\frac{1}{4!}\cdot\frac{1}{5}k^5}{\frac{2^5}{5!}k^5}\\ &=\frac{1}{16} \end{align}$

$\endgroup$
5
  • $\begingroup$ There is no chance for to start solving the one problem in the row, before the next one in the same spirit shows up :) In this case, there is a hard first step to order the $6n$ random points on the circle, this looks innocent, is not. After this step, we still need to find the position of the center w.r.t. three chords, and there is no condition that the chords see the center always on the "one side", so they are correlated. For instance, draw two consecutive chords only, the disk is separated in four pieces. Two cases still remain in play, if the third one... So in a second step.. $\endgroup$
    – dan_fulea
    Commented Apr 30 at 16:28
  • $\begingroup$ .. in a second step we need an analytic formula for the condition that the center of the circle is "inside". An integral has to be written, somewhere in it there is an $n$, and now the cherry comes on the top, we need an expression for the limit, the third step. I have nothing against the question, but please understand, it is sooo easy to collect some few decimals by Monte-Carlo simulation, and sooo hard to make even the first step... Please share the effort, and give us your thoughts already about the problem, what did you try, why is it interesting or helpful! $\endgroup$
    – dan_fulea
    Commented Apr 30 at 16:32
  • $\begingroup$ Can you clarify what happens if the 3 chords do not form a triangle? Do we ignore that case? $\endgroup$
    – Calvin Lin
    Commented Apr 30 at 18:05
  • $\begingroup$ @CalvinLin According to the description of how the chords are formed (note that that points are labelled going anticlockwise), each chord must intersect each of the other two chords, so the three chords must form a triangle. (If the three chords all intersect at a single point, then the triangle is a degenerate triangle of perimeter $0$, but this happens with probability $0$.) $\endgroup$
    – Dan
    Commented Apr 30 at 22:15
  • $\begingroup$ (Oh, I misread it as pick $6n$ points uniformly from the circle and construct those chords. In which case $P(1)$ would be "0 or 1" since it intersects at the center. $\quad$ In which case, I'm somewhat surprised that $P(1) \neq P(2)$, which implies there's something funky going on with the ordering of the points.) $\endgroup$
    – Calvin Lin
    Commented May 1 at 14:48

3 Answers 3

12
$\begingroup$

The exact value of the limiting probability is $$ \frac{1}{4} - \frac{3}{2\pi}\arcsin\Bigl(\frac{1}{3}\Bigr) \approx 0.08773983 $$ which agrees nicely with the numerics computed in the other answers.

The point is that this is essentially a question about order statistics, whose limiting distributions are known, and in particular Gaussian.

Specifically, if $U_1, \dotsc, U_{6n} \sim U(0, 1)$ and the corresponding order statistics are $U_{(1)}\leq \dotsc\leq U_{(6n)}$, then the probability that you're looking for is \begin{align*} &\Pr(U_{(3n)} \geq 1/2, U_{(4n)} - U_{(n)} \leq 1/2, U_{(5n)} - U_{(2n)} \geq 1/2) \\ &\qquad+ \Pr(U_{(3n)} \leq 1/2, U_{(4n)} - U_{(n)} \geq 1/2, U_{(5n)} - U_{(2n)} \leq 1/2) \\ &= 2 \Pr(U_{(3n)} \geq 1/2, U_{(n)} - U_{(4n)} \geq 1/2, U_{(5n)} - U_{(2n)} \geq 1/2) \end{align*} by symmetry. This is equivalent to your discussion of the relative arc lengths.

But now, $\sqrt{n}(U_{(n)} - 1/6, U_{(2n)} -1/3, U_{(3n)}-1/2, U_{(4n)}-2/3, U_{(5n)}-5/6)$ is asymptotically Gaussian, converging to a 5-dimensionnal mean-zero Gaussian $Z$ with $\mathrm{Cov}(Z_j, Z_k) = (j/6)(1 - k/6)$ when $j \leq k$. For a reference, see Result 5 here.

Thus, the limiting value that we are looking for is $$ \Pr(Z_3 \geq 0, Z_1 - Z_4\geq 0, Z_5 - Z_2 \geq 0). $$

This is the probability that the 3-dimensional Gaussian $(Z_3, Z_1 - Z_4, Z_5 - Z_2)$ lies in the positive orthant. There is a standard result (eg here) that this probability is given by $$ \frac{1}{8} + \frac{1}{4\pi}[\arcsin \mathrm{corr}(Z_3, Z_1 - Z_4) + \arcsin \mathrm{corr}(Z_3, Z_5 - Z_2)+ \arcsin \mathrm{corr}(Z_1 - Z_4, Z_5 - Z_2)]. $$

Knowing the covariances of each pair of $Z_j, Z_k$, it is straightforward to show that each of the correlations is $-1/3$.

Hence, the probability that we're ultimately interessted in is \begin{align*} 2\Pr(Z_3 \geq 0, Z_1 - Z_4\geq 0, Z_5 - Z_2 \geq 0) &= 2\Bigl(\frac{1}{8} + \frac{3}{4\pi}\arcsin(-1/3)\Bigr) \\ &= \frac{1}{4} - \frac{3}{2\pi}\arcsin\Bigl(\frac{1}{3}\Bigr). \end{align*}

$\endgroup$
10
+50
$\begingroup$

The distance between adjacent points is exponentially distributed. So, if we just take $6n$ independent exponential random variables and place points with those distances, we get uniformly distributed points. Never mind the radius, the problem is scale invariant! Now, the distance between the adjacent ends of chords is a sum of $n$ independent exponential variables (we can take $\lambda = 1$), hence it is gamma distributed with $\alpha = n, \beta = \lambda = 1$.

Denote them $S_1, S_2, S_3, S_4, S_5, S_6$. By symmetry (the other case how the triangle can form)

$$ P(n) = 2\mathbb{P}(S_1+S_2+S_3 > S_4+S_5+S_6 \text{ and } S_2+S_3+S_4 < S_5+S_6+S_1 \text{ and } S_3+S_4+S_5 > S_6+S_1+S_2 ) \\ = \int_0^\infty\int_0^\infty\int_0^\infty\int_0^\infty\int_0^\infty\int_0^\infty g(s) \frac{1}{((n-1)!)^6} \left(\prod s\right)^{n-1} e^{-\sum s} ds $$

where we have the indicator function

$$ g(s)= {\huge\mathcal{1}}_{s_1+s_2+s_3 > s_4+s_5+s_6}{\huge\mathcal{1}}_{s_2+s_3+s_4 < s_5+s_6+s_1} {\huge\mathcal{1}}_{s_3+s_4+s_5 > s_6+s_1+s_2} $$

Calculation of this integral would require some tedious case work and then what comes out of those parts, how to integrate that, I'm not sure.

UPDATE: I checked with computer that the region where $g=1$ is the convex hull of the origin and the rays defined by the vectors $v_j$ (rows of the following matrix)

$$ \left(\begin{array}{rrrrrr} 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right) $$

So we should be able to let $t_1,\dots, t_6$ range all freely from $0$ to $\infty$ and $\sum t_j v_j$ travels all over the integration region. The product of $s$'s then has a sum of two $t$'s inside (if I see correctly) but at least the bounds aren't problem and we can start integrating.

Further edit: So we get the following.

$$ P(n) = \frac{4}{\left(2^n(n-1)!\right)^6} \sum_{c, a} c \prod a_j! $$

where $c, a$ are defined like this: Expand the polynomial

$$ ((t_1+t_2+t_3)(t_1+t_5+t_6)(t_3+t_4+t_5)t_4t_2t_6)^{n-1} $$

and then for each term, $c$ is the coefficent, $a$ is the degree-tuple of the monomial. Here's a Sage code that does the calculation:

t1,t2,t3,t4,t5,t6 = polygens(QQ, names='t1,t2,t3,t4,t5,t6')
ts = (t1,t2,t3,t4,t5,t6)
q = ((t1+t2+t3)*(t1+t5+t6)*(t3+t4+t5)*t4*t2*t6)

for n in range(1, 7):
    Pn = 4*sum(c*prod(factorial(a.degree(t)) for t in ts) for c,a in q**(n-1))/(2**n*factorial(n-1))**6
    print ("P(%d) = %s = %f" %(n, Pn, Pn) )

We get

P(1) = 1/16 = 0.062500
P(2) = 19/256 = 0.074219
P(3) = 5149/65536 = 0.078568
P(4) = 331/4096 = 0.080811
P(5) = 5514679/67108864 = 0.082175
P(6) = 356876389/4294967296 = 0.083092

Should also be able to prove the limit from this formula, let's see...

Anyway, this eases the simulation, with for example this Sage-code:

def simu(n, simuN):
    T = RealDistribution('gamma', [n, 1])
    good = 0
    for _ in range(simuN):
        s1,s2,s3,s4,s5,s6 = [T.get_random_element() for j in range(6)]
        if ((s1+s2+s3>s4+s5+s6 and s2+s3+s4<s5+s6+s1 and s3+s4+s5>s6+s1+s2) ):
            #by symmetry the other half suffices
            #or (s1+s2+s3<s4+s5+s6 and s2+s3+s4>s5+s6+s1 and s3+s4+s5<s6+s1+s2) ):
            good += 1
    return 2*float(good/simuN)


print (simu(10**6, 10**7))
#0.087784
$\endgroup$
4
  • $\begingroup$ Does your Sage-code show that a simulation with $10^7$ trials yielded $P(10^6)\approx 0.087784$ ? $\endgroup$
    – Dan
    Commented May 9 at 21:59
  • 1
    $\begingroup$ Maybe $\lim\limits_{n\to\infty}P(n)=\left(\frac23\right)^6\approx 0.087791$ ? $\endgroup$
    – Dan
    Commented May 9 at 22:05
  • 2
    $\begingroup$ @Dan Yes, thanks. My Python version of the above code with (n, simuN)=$(10^6,10^9)$ gave $0.08773 \pm 0.00003=(0.08770, 0.08776)$ as 99.9% CI. (Previous comment deleted.) $\endgroup$
    – r.e.s.
    Commented May 10 at 0:16
  • 1
    $\begingroup$ Yes, that value sounds very likely. I managed to solve the integral with a change of variable and now it's just expanding a certain $6$ variable polynomial and calculating certain sum from it. Let's see if the limit behaviour is clear from that... $\endgroup$
    – ploosu2
    Commented May 10 at 4:23
7
$\begingroup$

Unfortunately, due to my account not having 50 reputation, I cannot comment on this question, I wrote this much more efficient code in python that can be used to do the simulation and it seems like it does have a limit it kind of converges to.

Keep in mind that this code runs everything on parallel so it might take a lot of CPU and hang your system. Here is the python code:

import numpy as np
from numba import njit, prange
import matplotlib.pyplot as plt

@njit
def get_intersect(x1, y1, x2, y2, x3, y3, x4, y4):
    denominator = (x1 - x2) * (y3 - y4) - (y1 - y2) * (x3 - x4)
    if denominator == 0:
        return (np.NaN, np.NAN)  # Lines are parallel or coincident

    px = ((x1 * y2 - y1 * x2) * (x3 - x4) - (x1 - x2) * (x3 * y4 - y3 * x4)) / denominator
    py = ((x1 * y2 - y1 * x2) * (y3 - y4) - (y1 - y2) * (x3 * y4 - y3 * x4)) / denominator

    return (px, py)

@njit
def is_origin_in_triangle(pts):
    def sign(p1, p2):
        return p1[0] * p2[1] - p2[0] * p1[1]

    d1 = sign(pts[0], pts[1])
    d2 = sign(pts[1], pts[2])
    d3 = sign(pts[2], pts[0])
    
    has_neg = (d1 < 0) or (d2 < 0) or (d3 < 0)
    has_pos = (d1 > 0) or (d2 > 0) or (d3 > 0)

    return not (has_neg and has_pos)

@njit(parallel=True)
def simulate_probability(n, trials=1_000_000):
    count = 0
    for _ in prange(trials):
        angles = np.sort(np.random.uniform(0, 2 * np.pi, 6 * n))

        points = np.array([
            [np.cos(angles[0]), np.sin(angles[0])],
            [np.cos(angles[3 * n]), np.sin(angles[3 * n])],
            [np.cos(angles[n]), np.sin(angles[n])],
            [np.cos(angles[4 * n]), np.sin(angles[4 * n])],
            [np.cos(angles[2 * n]), np.sin(angles[2 * n])],
            [np.cos(angles[5 * n]), np.sin(angles[5 * n])]
        ])

        intersect1 = get_intersect(points[0][0], points[0][1], points[1][0], points[1][1],
                                   points[2][0], points[2][1], points[3][0], points[3][1])

        intersect2 = get_intersect(points[2][0], points[2][1], points[3][0], points[3][1],
                                   points[4][0], points[4][1], points[5][0], points[5][1])
        
        intersect3 = get_intersect(points[4][0], points[4][1], points[5][0], points[5][1],
                                   points[0][0], points[0][1], points[1][0], points[1][1])

        # Before creating array, check for NaN values
        if (np.NaN, np.NaN) not in [intersect1, intersect2, intersect3]:
            triangle_points = np.array([intersect1, intersect2, intersect3])
            if is_origin_in_triangle(triangle_points):
                count += 1

    return count / trials

n = 1_000_000

probabilities = []
mult_increment = 10

i = 1
while i <= n:
    probability = simulate_probability(i)
    probabilities.append(probability)
    print(f"Estimated Probability P({i}) =", probability)
    
    i *= mult_increment

plt.figure(figsize=(10, 5))
plt.plot(range(1, len(probabilities) + 1), probabilities, marker='o', linestyle='-', color='b')
plt.title('Estimated Probability vs. n')
plt.xlabel('n')
plt.ylabel('Estimated Probability')
plt.grid(True)

plt.savefig('estimated_probability_vs_n.png', format='png', dpi=300)

plt.show()

Update: A much more efficient code based on the other response (I don't understand a lot of probability so gamma functions are new to me).

import numpy as np
from numba import njit, prange
import matplotlib.pyplot as plt

@njit(parallel=True)
def simulate_probability(n, trials=1_000_000):
    count = 0
    for _ in prange(trials):
        samples = np.random.gamma(n, 1, 6)
        s1, s2, s3, s4, s5, s6 = samples
        if (s1 + s2 + s3 > s4 + s5 + s6 and s2 + s3 + s4 < s5 + s6 + s1 and s3 + s4 + s5 > s6 + s1 + s2):
            count += 2 # Cool answer below shows the symmetrty 
    return count / trials

n = 10**16
probabilities = []
n_values = []
i = 1
while i <= n:
    probability = simulate_probability(i)
    probabilities.append(probability)
    n_values.append(i)
    print(f"Estimated Probability P({i}) =", probability)
    
    if i < 1000:
        i += 1
    elif i < 10000:
        i += 2
    else:
        i *= 2  

plt.figure(figsize=(10, 5))
plt.plot(n_values, probabilities, marker='o', linestyle='-', color='b', label='Estimated Probability')
plt.xscale('log') 
plt.title('Estimated Probability vs. n')
plt.xlabel('n')
plt.ylabel('Estimated Probability')
plt.grid(True)
plt.legend()

plt.savefig('estimated_probability_vs_n.png', format='png', dpi=300)
plt.show()

enter image description here

$\endgroup$
7
  • $\begingroup$ Thanks. Based on this code, have you obtained an estimate of the limit? $\endgroup$
    – Dan
    Commented May 9 at 8:44
  • 1
    $\begingroup$ My laptop is trying its best right now I am running P($10^6$) with $10^6$ sets $\endgroup$ Commented May 9 at 8:45
  • 1
    $\begingroup$ I am rewriting my program in C because it is taking too long to run :< $\endgroup$ Commented May 9 at 9:42
  • 1
    $\begingroup$ @Dan Update: I ran the program for P(10000) with $10^6$ sets and got 0.087658. It seems to converge near 0.876 I think $\endgroup$ Commented May 9 at 10:26
  • 3
    $\begingroup$ Computer numerical results are suspect for values larger than around $10^{14}$ or smaller than $10^{-14}$, since that's when you start to hit things like x+1==x, unless you've carefully designed the algorithm for floating point limitations. $\endgroup$
    – aschepler
    Commented May 9 at 23:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .