4
$\begingroup$

Let's assume we have an array of length $5$ which contains pairwise different integers. The subcript denotes the order of the respective integer, so $i_1<i_2<i_3<i_4<i_5$.

We apply the randomized quicksort algorithm to the unordered array and would like to know the probability $P(A_{24})$ that two integers $s_2$ and $s_4$ are being compared. (Note that every item in the array is chosen as pivot with equal probability)


In our lecture the professor simply says that if we think about it for a while it becomes trivial that $P(A_{24})=\frac{2}{4-2+1}=\frac{2}{3}$. However it didn't....


I would like to have a mathematical frame to argue that $P(A_{24})=\frac{2}{3}$.

My attempt:

So far I have only assumed that we already have a discrete probability space $(\Omega,P)$ where $\Omega=\{i_1,i_2,i_3,i_4,i_5\}$ and $B_i$ denotes the event that $s_i$ has been chosen as pivot element. So I simply collect all the events where $s_2$ or $s_4$ are the pivot at some stage during the algorithm and get \begin{align*} &P(A_{24})=\\ &P(B_2)+P(B_4)+P(B_1)P(B_2\mid B_1)+P(B_5)P(B_2\mid B_5)+P(B_5)P(B_4\mid B_5)+P(B_1)P(B_4\mid B_1)\\ &+P(B_1)P(B_5\mid B_1)P(B_2\mid B_1\cap B_5)+P(B_5)P(B_1\mid B_5)P(B_2\mid B_5\cap B_1)\\ &+P(B_1)P(B_5\mid B_1)P(B_4\mid B_1\cap B_5)+P(B_5)P(B_1\mid B_5)P(B_4\mid B_5\cap B_1)\dots \end{align*} If we use some intuition and the fact that every element has the same probability to be chosen as pivot we get (conditional) probabilities \begin{align*} &\dots=\frac{1}{5}+\frac{1}{5}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}\\ &+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}\\ &=\frac{1}{5}+\frac{1}{5}+\frac{1}{5}+\frac{1}{15}=\frac{2}{3}. \end{align*}

Though this seems much clearer to me why we get $\frac{2}{3}$ I am not sure about the actual definition of the probability space. How would you set up an appropriate probability space in order to derive $P(A_{24})$? Or is there a simpler approach to show why $P(A_{24})=\frac{2}{3}$?

$\endgroup$

1 Answer 1

4
$\begingroup$

Between $i_2$ and $i_4$ there is only one number, $i_3$. If it is chosen as a pivot before $i_2$ or $i_4$ are, then $i_2$ and $i_4$ won't be compared with each other (they will be compared with $i_3$ and placed on the opposite sides of the pivot). So the probability of $i_2$ and $i_4$ not being compared is the probability of $i_3$ being chosen as pivot first out of the set $\{i_2, i_3, i_4\}$, which is $\frac{1}{3}$. By complement, the probability of them being compared is $\frac{2}{3}$.


Edit: here is a more formal version of the argument:

Let $T= \{i_1, i_2 \dots i_5\}$, $\;S=\{i_2,i_3,i_4\}$.

Let $p_1$ be the first pivot chosen.

Denote $P_T(A_{24})$ to be the probability that the comparison $A_{24}$ occurs in the quicksort of a random ordering of $T$. Ie. we consider the quicksort of a random ordering of $T$ as an implicit condition for our events to happen at all.

We will take an inductive approach to the general formula. We know that $P_S(A_{24}) = \frac{2}{3}$ (quicksort of a random ordering of $S$). Additionally, assume we proved $P_M(A_{24}) = \frac{2}{3}$ for any $M$ which has $S\subset M$ and $|M| < |T|$.

We have:

$$P_T(A_{24}) = P_T(p_1 \in S) \; P_T( A_{24} \;|\; p_1 \in S ) + P_T(p_1 \notin S) \;P_T( A_{24} \;|\; p_1 \notin S) $$

Let us compute the terms of this formula.

$P_T(p_1 \in S) = \frac{3}{5}$

$P_T(p_1 \notin S) = (1 - P_T(p_1 \in S)) = \frac{2}{5}$

These only important thing about the above branch condition probabilities is that they always sum to 1. Our main aim is to show that: $$ P_T( A_{24} \;|\; p_1 \in S ) = P_T( A_{24} \;|\; p_1 \notin S) = \frac{2}{3}$$

If we show that these two are equal, the branch conditions $P_T(p_1 \in S)$ and $P_T(p_1 \notin S)$ sum up by common factor.

From the initial argument we can see that $P_T( A_{24} \;|\; p_1 \in S ) = \frac{2}{3}$. (the first pivot being in $\{i_2,i_3,i_4\}$ makes it easy to compute). What remains is to show that $P_T( A_{24} \;|\; p_1 \notin S ) = \frac{2}{3}$ .

We have:

$$P_T( A_{24} \;|\; p_1 \notin S ) =\\ P_T(p_1 < i_2 \;|\; p_1 \notin S ) \;P_T( A_{24} \;|\; p_1< i_2 ) + \\+\;P_T( i_4 < p1 \;|\; p_1 \notin S ) \;P_T( A_{24} \;|\; i_4 < p_1 )$$

As before, $P_T( p_1 < i_2 \;|\; p_1 \notin S)$ and $P_T(i_4 < p1 \;|\; p_1 \notin S)$ are complementary events that sum to 1, we are only interested in the other factors.

$$P_T(A_{24}\;|\; p_1<i_2) = \sum_{j < i_2} P_T( p_1 = j\;|\; p_1 < i_2 ) P_T( A_{24} \;|\; p_1 = j)$$

Same argument, $P_T( p_1 = j\;|\; p_1 < i_2 )$ sum to 1, we are interested in $P_T( A_{24}\;|\; p_1 = j)$. Given $j < i_2$ we have:

$$P_T( A_{24} \;|\; p_1 = j) = P_{\{x\in T \;|\; x > j\}}(A_{24})$$

Ie. if the first pivot is chosen such that $S$ will fall on the right side of the pivot, what remains is the probability of $A_{24}$ appearing in a quicksort of said right side. However, the set $ L = \{x\in T | x > j\} $ obeys $S \subset L$ and $|L| < |T|$ . Therefore, by inductive hypothesis we have $P_{\{x\in T | x > j\}}(A_{24}) = \frac{2}{3}$ for $j< i_{2}$. We can do a similar thing for $$P_T( A_{24} \;|\; p_1 = k) = P_{\{x\in T \;|\; x < k\}}(A_{24})$$ where $k > i_4$ .

These correspond to the cases where all elements of $S$ fall on the right or the left side of the first pivot, respectively. They all reduce to $\frac{2}{3}$, except for the branch condition probabilities, which always sum to one. So the final probability is $\frac{2}{3}$.

$\endgroup$
7
  • $\begingroup$ Why not out of the set $\{i_1,i_2, i_3, i_4\}$? Your answer seems very intuitive but is it possible to put it into a probability space context? $\endgroup$
    – Philipp
    Commented Aug 27, 2022 at 16:20
  • $\begingroup$ What matters is the order of choosing of the pivots, specifically, only those pivots between, and including, $i_2$ and $i_4$. Because only if an "in-between" pivot (in our case, only $i_3$), is chosen first (ie. before the other two), will those two be separated and never be compared. The "in between" pivots act as "walls" between $i_2$ and $i_4$, if chosen first. The other ones (such as $i_1$) do not. $\endgroup$ Commented Aug 27, 2022 at 16:27
  • $\begingroup$ It is clear to what happens if the pivot lies between/is equal to $s_2$ or$s_4$ but I don't understand how this allows us to derive a probability. $\endgroup$
    – Philipp
    Commented Aug 27, 2022 at 16:48
  • $\begingroup$ I've tried to formalize the argument. $\endgroup$ Commented Aug 27, 2022 at 18:10
  • $\begingroup$ Excellent answer! But it feels a bit like "cheating" because we don't know if there exists a probability space that suits - we simply assume that it already exists without knowing how it looks like. Do you think that there is an easy way at all to find a discrete probability space? I am thinking of something like $\Omega:=\Omega_1\times\times\dots\Omega_n$ where the different $\Omega_i$ denote the selection of a pivot. Then we define a probability function on each $\Omega_i$. As it is not clear beforehand how many pivots we will select it seems complicated to find an appropriate sample space. $\endgroup$
    – Philipp
    Commented Aug 28, 2022 at 15:50

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .