9
$\begingroup$

Uncle's epic journey

One year ago, my uncle set off from our village on an epic journey, in which every day he travels to the nearest unvisited village and stays there for the night. The villages in our country are randomly located (independently and uniformly) with an average of $1$ village per square mile. Our village is in the middle of a very large country, such that my uncle could not possibly reach the border within one year.

Here is an example of what the first two weeks of his journey could look like.

enter image description here

I haven't heard from my uncle since he left, and I don't have a map. How far away do you reckon he is now? That is,

What is the expectation of his distance (as the crow flies) from our village?

My attempt

I have only been able to get a crude approximation, $\frac14\sqrt{365\pi}\approx8.47$ miles, by making the following two assumptions.

  • Assume his journey is effectively just like a $2D$ random walk with constant step size $\lambda$, where $\lambda$ is defined as the expected step size of his actual journey.
  • Assume $\lambda=$ expected nearest neighbor distance in a $2D$ Poisson process with intensity $1$.

Under the second assumption, $\lambda=\frac12$. Proof: If $X$ is the distance to a village's nearest neighbor, then the density function is $f(x)=2\pi xe^{-\pi x^2}$, so $\mathbb{E}(X)=\int_0^\infty xf(x)\mathrm dx=\frac12$.

Then under the first assumption, my uncle's expected distance from our village is $\frac{\lambda}{2}\sqrt{365\pi}\approx8.47$.

I suspect the first assumption causes an underestimate: my uncle never revisits a village, so his path involves less "back-tracking" than a $2D$ random walk. The second assumption definitely causes an underestimate, because usually my uncle's step size is greater than the nearest neighbor distance. So I suspect the correct answer is considerably larger than $8.47$.

Context

Recently I have been investigating inherent properties of the $2D$ Poisson process, for example here and here.

$\endgroup$
9
  • 1
    $\begingroup$ If the country is unbounded, what does it mean for the villages to be uniformly distributed? $\endgroup$
    – quasi
    Commented Feb 5 at 9:05
  • $\begingroup$ He will make a spiral with radius R. $\endgroup$
    – Bob Dobbs
    Commented Feb 5 at 9:22
  • 4
    $\begingroup$ IMHO, someone should do a simulation and verify as number of day $n$ changes, the average distance of uncle from home do scale like $\sqrt{\frac{n}{\pi\alpha}}$ for some constant $\alpha \in (0,1)$. Personally, I bet $\alpha$ should be somewhat between $0.4 - 0.6$. The first estimate corresponds to a restrict random walk where at each step, the direction of movement is at an angle at most $120^\circ$ from previous movement. The second estimate corresponds to the case in each movement of a typical distance $r$, your uncle cannot revisit a circle of same radius at the current position. $\endgroup$ Commented Feb 6 at 8:33
  • 1
    $\begingroup$ Your derivation of the expected distance for fixed-size steps isn't quite correct. You should be finding $E(N) \approx \sqrt{\pi N} /4 \approx 0.44311 \sqrt{N}$ when the step size is $1/2$. $\endgroup$
    – mjqxxxx
    Commented Feb 8 at 16:39
  • 1
    $\begingroup$ And as @achillehui says, if you slightly modify the process so that each step must be within an angle $2\pi / 3$ of the previous one, then you find $E'(N) \approx 0.69 \sqrt{N}$ instead... an increase of around $60\%$. $\endgroup$
    – mjqxxxx
    Commented Feb 8 at 19:08

1 Answer 1

5
+50
$\begingroup$

It doesn't seem easy to me to even prove that the asymptotic behavior is $E(N) \sim \alpha \sqrt{N}$ for some constant $\alpha$. But we can certainly simulate the problem (and a couple of its variants) to try to find out. Suppose we look at three scenarios:

  • (A) Steps of fixed length $1/2$, each with a randomly chosen direction;
  • (B) Steps of fixed length $1/2$, each with a direction no more than $120^\circ$ from the previous step;
  • (C) (OP's problem) Each step goes to the nearest unvisited point, where points are generated by a Poisson process with average density $1$.

Now, we can simulate each scenario a large number of times for a few different values of $N$, and estimate $E(N)/\sqrt{N}$ for each. Here's what I find (taking $10^5$ or $10^6$ samples for each $N$ for the first two methods, and $10^3$ samples for the third and most computationally intensive method):

\begin{array}{|c|c|c|} \hline N & {\text {Method A}} & {\text {Method B}} & {\text {Method C}} \\ \hline 50 & 0.4440 \pm 0.0002 & 0.6845 \pm 0.0003 & 1.38 \pm 0.02 \\ 100 & 0.4436 \pm 0.0007 & 0.688 \pm 0.001 & 1.45 \pm 0.02 \\ 200 & 0.4432 \pm 0.0007 & 0.688 \pm 0.001 & 1.56 \pm 0.02 \\ 400 & 0.4432 \pm 0.0007 & 0.687 \pm 0.001 & 1.66 \pm 0.02 \\ 1000 & 0.4428 \pm 0.0007 & 0.687 \pm 0.001 & 1.78 \pm 0.03 \\ \hline \end{array}

The first two scenarios converge more or less immediately to a consistent final value for $E(N)/\sqrt{N}$, giving us confidence that we have the correct form for the large-$N$ behavior.

The third scenario, however, appears to have a coefficient that is still increasing with $N$. This suggests that the large-$N$ behavior for OP's problem may in fact be something more like $E(N) \sim \alpha \sqrt{N \log N}$. But I don't have even a heuristic justification for any particular functional form... many (including $\alpha\sqrt{N}$, if $\alpha$ converges slowly enough) are consistent with the data.


Update:

Expanding on the discussion in the comments. The root mean square of the distance traveled after $N$ steps can be expressed in terms of the autocorrelation expectations $\xi_k^{(n)} \equiv E[r_n \cdot r_{n+k}]$, where $r_n$ is the $n$-th step taken. Let $R_N = \sum_{i=1}^{N} r_i$ be the total displacement after $N$ steps; then $$ E[R_N\cdot R_N]=\sum_{i,j=1}^{N}E[r_i \cdot r_j]=\sum_{i=1}^{N}\xi_0^{(i)} + 2\sum_{k=1}^{N-1}\sum_{i=1}^{N-k}\xi_k^{(i)} \\ = N\xi_0 + 2\sum_{k=1}^{N-1}(N-k)\xi_k \\ = N\xi_0 + 2N\sum_{k=1}^{N-1}\xi_k - 2\sum_{k=1}^{N-1}k\xi_k, $$ where we made the assumption that the $\xi_k$ are time-invariant... i.e., $\xi_k^{(i)}$ does not depend on $i$. If $\xi_k$ goes to zero fast enough as $k\rightarrow\infty$ (for instance, if $\sum_{k=1}^{\infty}k\xi_k$ and $\sum_{k=1}^{\infty}\xi_k$ converge), then clearly $E[R_N \cdot R_N]$ is $\Theta(N)$ and the root mean square of the distance traveled is $\Theta(\sqrt{N})$. On the other hand, if, say, $\xi_k \sim \beta / k$, then the middle term dominates, and the RMS of the distance traveled is instead $\sim \sqrt{2\beta N \log N}$.

My estimates for $\xi_k$ based on $10^3$ simulated journeys of length $1000$ are the following:

\begin{array}{|c|c|} \hline k & \xi_k \\ \hline 1 & 0.1989 \pm 0.0005 \\ 2 & 0.1414 \pm 0.0005 \\ 4 & 0.0849 \pm 0.0005 \\ 8 & 0.0425 \pm 0.0005 \\ 16 & 0.0159 \pm 0.0005 \\ 32 & 0.0069 \pm 0.0006 \\ 64 & 0.0048 \pm 0.0006 \\ 128 & 0.0018 \pm 0.0006 \\ \hline \end{array}

These are certainly consistent with $\xi_k \sim \beta / k $, where $\beta \in (0.2, 0.3)$. In which case we would expect the distance traveled to grow as $\alpha \sqrt{N \log N}$, where $\alpha \in (0.63, 0.77)$. And, indeed, this reproduces the observed behavior very well... assuming $E(N)=\alpha\sqrt{N\log N}$, the direct estimates of $\alpha$ (from the first part of this answer) are $\in (0.675, 0.698)$.

$\endgroup$
4
  • $\begingroup$ As to the original question: after $365$ days, by my calculations, the uncle is expected to be about $31.5$ miles away. But you shouldn't be surprised if he's as close as $9$ miles or as far away as $55$ miles... each of these happens about $5\%$ of the time. $\endgroup$
    – mjqxxxx
    Commented Feb 9 at 6:01
  • $\begingroup$ For the third case, one thing that may help is look at correlation of the displacements in each step. Let's say $r_n$ is the $n^{th}$ displacement. If $E[r_n\cdot r_{n+k}]$ falls off fast enough (say exponentially or algebraically $k^{-\beta}$ with large enough $\beta$) as $k$ increases, we should expect $E(N) \sim \alpha\sqrt{N}$ remains to hold. $\endgroup$ Commented Feb 9 at 6:54
  • $\begingroup$ Yes, that's the question... assuming $E[r_n \cdot r_{n+k}]$ depends only on $k$ (which isn't guaranteed either, but no matter) -- call it $\xi_k$ -- you have $E[R_N^2]=N \xi_0 + \sum_{k=1}^{N-1} 2(N-k)\xi_k$. Which is going to be $O(N)$ if $\xi_k$ is summable, e.g., decays exponentially or as $k^{-\beta}$ with $\beta > 1$. $\endgroup$
    – mjqxxxx
    Commented Feb 9 at 15:32
  • $\begingroup$ Cool, it is sort of unexpected that $\xi_k$ falls off that slow. It is too bad I can't upvote again ;-) $\endgroup$ Commented Feb 10 at 4:18

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .