1
$\begingroup$

Suppose I have a 1 Dimensional Random Walk (on the integer line) with the following properties (initial conditions):

  • At time=0 it starts at position=0
  • At each time point, there is a 0.5 probability of moving +1 or -1

This Random Walk keeps going until one of the following things happens:

  • The Random Walk reaches position=10 (successful termination condition)

  • 100,000 Steps are taken and position=10 is still not reached (unsuccessful termination condition)

I wrote the following R simulation to represent random simulations of this Random Walk with these conditions:

library(ggplot2)

num_simulations <- 1000
max_steps <- 100000
target <- 10
steps <- numeric(num_simulations)
trajectories <- list()
terminated <- 0


for (i in 1:num_simulations) {
    position <- 0
    print(i)
    trajectory <- c(0)
    for (step in 1:max_steps) {
        position <- position + sample(c(-1, 1), 1)
        trajectory <- c(trajectory, position)
        if (position == target) {
            steps[i] <- step
            break
        }
        if (step == max_steps) {
            terminated <- terminated + 1
        }
    }
    trajectories[[i]] <- trajectory
}


termination_percentage <- (terminated / num_simulations) * 100


df1 <- data.frame(simulation_index = 1:num_simulations, log_steps = log(steps))
df2 <- data.frame(simulation_index = rep(1:num_simulations, sapply(trajectories, length)),
                  step = unlist(lapply(trajectories, seq_along)) - 1,
                  position = unlist(trajectories))


p1 <- ggplot(df1, aes(x = simulation_index, y = log_steps)) +
    geom_point() +
    geom_line() +
    labs(x = "Simulation Index", y = "Log(Number of Steps)",
         title = paste("Random Walk Simulations (Terminated:", termination_percentage, "%)")) +
    theme_bw()

p2 <- ggplot(df2, aes(x = step, y = position, group = simulation_index)) +
    geom_line(alpha = 0.1) +
    labs(x = "Step", y = "Position",
         title = paste("Trajectories of Random Walk Simulations (Terminated:", termination_percentage, "%)")) +
    theme_bw()

As we can see, roughly 2% of all simulations (with these initial conditions) met the unsuccessful termination condition.

enter image description here

My Question: Given a set of initial conditions and a set of termination conditions, is it possible to derive an expression for the expected number (i.e. expected value) of simulations expected to meet the unsuccessful termination condition?

Thanks!

$\endgroup$
9
  • $\begingroup$ Does "meeting the terminal condition" mean taking 100000 steps and 100 is not reached? That's what I get from your code. Because technically speaking the walk always meet one of the terminal conditions. $\endgroup$
    – ioveri
    Commented Apr 21 at 4:24
  • $\begingroup$ @ioveri : thank you for your reply! my apologies, what I meant was: the termination condition is taking 100000 steps. I have made made an edit in the question to clarify. Think of it this way : if position=100 before 100000 steps then successful termination else unsuccessful termination. I am interested in studying "unsuccessful terminations". Thank you so much! $\endgroup$
    – stats_noob
    Commented Apr 21 at 4:27
  • $\begingroup$ @stats_noobs the target you set is 10, and not 100 so the probability is lower than it should be. Please update your program and image $\endgroup$
    – ioveri
    Commented Apr 21 at 12:47
  • $\begingroup$ @ioveri: thank you for pointing this out. I updated it $\endgroup$
    – stats_noob
    Commented Apr 21 at 14:29
  • 1
    $\begingroup$ I've derived closed form expression for exact probability. It's based on reflection, I will edit the answer when I have time $\endgroup$
    – ioveri
    Commented Apr 22 at 14:54

1 Answer 1

2
$\begingroup$

EDIT: I've added the closed form expression of the probability

The probablity of unsuccessful walk after $n$ steps is given as follows

$$ P(n) = \sum_{k = 0}^{n} 2^{-n}p(k,n)$$ where $$p(k,n)=\begin{cases} \displaystyle\binom{n}{k} & k < x_c-x_0\\ \displaystyle\binom{n}{k} - \binom{n}{k+x_0-x_c}&x_c - x_0\leq k < \dfrac{n + x_c -x_0}{2}\\ 0 &k \geq \dfrac{n + x_c - x_0}{2} \end{cases}$$

or compactly $$ P(n) = \sum_{k = 0}^{\lfloor\frac{1}{2}(n+x_c-x_0)\rfloor} 2^{-n}\binom{n}{k} - \sum_{k = x_c-x_0}^{\lfloor\frac{1}{2}(n+x_c-x_0)\rfloor} 2^{-n}\binom{n}{k+x_0-x_c}$$

The function $p(k,n)$ represents the number of paths that reach position $x = x_0-n+2k$ without reaching the termination target $x_c$. Hence $P(n)$ is the sum of $p(k,n)$ for all $x$, with the weight $2^{-n}$ for each path. In order to derive the formula of $p(n)$ we need to prove the following relation $$\text{TNOW to reach } x\text{ from } x_0 \text{ without reaching }x_c\\= \\ \text{TNOW to reach } x\text{ from } x_0 - \text{TNOW to reach }x \text{ from } 2x_c - x_0$$ TNOW stands for "the number of ways" or the number of random walks. Let $X(i)$ be the position at step $i$ of a random walk that crosses or touches the boundary $x_c$. Let's define the reflection $R$ that maps the random walk $X$ to another random walk $Y_X$ by the following formula

$$Y_X(i) = \begin{cases} 2x_c - X(i) &i < L\\ X(i) &i \geq L \end{cases}$$

where $L$ is the last step that at which the randow walk reached the position $x_c$. The figure below illustrates how the function is defined

enter image description here

This function maps random walk that starts from point $x_0$ and ends at $x$ to a random walk that starts at point $2x_c-x_0$ and also ends at $x$, given they reach the boundary at least once, and vice versa. Furthermore, this reflection is an involution i.e. $R(R(X)) = X$, hence it is bijection between these set of random walks. And therefore their numbers are equal. Since the number of RW that reach $x$ and don't reach $x_c$ = the number of RW that reach $x$ - the number of RW that reach x and reach $x_c$ at some step(s), this gives us the relation aforementioned.

This reflection is the technique that is used to prove the reflection principle

The last thing we need to do is calculate the number of RW that starts from $x_0$ and $2x_c - x_0$, which are given by binomial distribution. This gives the formula of $p(k,n)$ Q.E.D.

Here's the old answer with approximation

The amount of time for a random process to reach a certain point for the first time is known as the first hitting time. For a Brownian motion, the survival probability of particle remains at the posittion $x < x_c$ at time $t$ is given by $$ \DeclareMathOperator{\erf}{erf} S(t) =\erf\left(\frac{x_c-x_0}{2\sqrt{Dt}}\right)$$ where $D$ is the diffusion constant. The expected number of trials until the first hitting time is larger than $t$ is $\dfrac{1}{S(t)}$. A 1D random walk with time step $\delta$ and step size $\epsilon$ is approximately a 1D Brownian motion with $D=\dfrac{\epsilon^2}{2\delta}$, thus $$\DeclareMathOperator{\erf}{erf} S_{\mathrm{RW}}(t)\approx \erf\left(\frac{x_c-x_0}{2\sqrt{\frac{\epsilon^2t}{2\delta}}}\right)$$ Given $\delta = 1, \epsilon = 1, x_c = 100, x_0 = 0$ we have $S(100000) \approx \mathrm{erf}(0.3863) \approx 0.2481\%$ and the expected number of trials is approximately $4.03$

The data given in the program, however, correspond to $x_c = 10$ (targer <- 10) and so $S(100000) \approx \mathrm{erf}(0.03863) \approx 2.52\%$ . $10$ is quite low and the lower $x_c$ leads to higher error in the approximation.

Edit: I've also used Python and in R to test it out (not very efficient though)

import random
t = []
target = 100
trials = 1000
for i in range(trials):
    x = 0
    for j in range(100000):
        x += -1 + random.randint(0,1)*2
        if x == target:
            t.append(j)
            break   
    print(f"Trial {i+1} finished after {j+1} steps")

print(1 - len(t)/trials)

R:

num_simulations <- 1000
max_steps <- 100000
target <- 100
steps <- numeric(num_simulations)
trajectories <- list()
terminated <- 0


for (i in 1:num_simulations) {
  position <- 0
  print(i)
  for (step in 1:max_steps) {
    position <- position + sample(c(-1, 1), 1)
    if (position == target) {
      steps[i] <- step
      break
    }
  }
  if (step == max_steps) {
    terminated <- terminated + 1
  }
}


sprintf("The percentage of unsuccessful attempts is %f %%", (terminated / num_simulations) * 100)

I got $22.6\%$ and $24.4\%$ for $x_c = 100$ and $2.5\%$ and $3.2\%$ for $x_c = 10$. which are quite close to the estimation

$\endgroup$
6
  • $\begingroup$ Thank you so much for this wonderful answer! Really looking forward to your updated answer about the derivation of hitting times involving the reflection principle! :) $\endgroup$
    – stats_noob
    Commented Apr 23 at 1:11
  • $\begingroup$ @stats_noob I've updated my answer $\endgroup$
    – ioveri
    Commented Apr 23 at 5:12
  • $\begingroup$ Wow this is such a beautiful answer ... thank you so much for this! $\endgroup$
    – stats_noob
    Commented Apr 23 at 5:15
  • $\begingroup$ I am currently working on this - do you have any ideas about this question? math.stackexchange.com/questions/4903926/… thank you so much! $\endgroup$
    – stats_noob
    Commented Apr 23 at 5:16
  • $\begingroup$ The weight $2^{-k}$ is replaced by $p^(n-k)(1-p)^k$ in the discrete case. In the continuous you just shift the distribution by $t(2p-1)$ because the limit of a biased random walk is a Brownian motion with drift $\endgroup$
    – ioveri
    Commented Apr 23 at 6:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .