1
$\begingroup$

Quicksort has an expected runtime of $\mathcal O(n\log n)$ when choosing a pivot uniformly at random. Now consider that before each iteration of quicksort, we sample $\log n$ elements of the array and compare them with with the pivot (chosen uniformly at random), and only keep the pivot if it is in the middle third of the sampled array, otherwise we sample again (both pivot and the $\log n$ elements). The purpose of this would be to avoid running the partition operation (which takes $\mathcal O(n)$ time) with bad pivots.

I would like to analyze the running time of this algorithm. I know that when choosing the expected runtime in the original case, it satisfies $$ T(n) \leq \mathcal O(n) + \frac{1}{n} \sum^{n-1}_{i=0}(T(i) + T(n-i))$$

What is the expected running time recurrence in this modified version of quicksort?

My guess is that it the expected running time would become, $$ T(n) \leq \left( \frac{1}{3} \right)\log n + \left( \frac{2}{3} \right) ( \mathcal O(n) + \frac{1}{n} \sum^{n-1}_{i=0}(T(i) + T(n-i))),$$ but I'm not sure.

$\endgroup$
4
  • $\begingroup$ Modified Qsort's overall time complexity would still be $O(n \log n)$ because you will only impact the constant factor. $\endgroup$
    – vvg
    Commented Oct 18, 2022 at 17:17
  • $\begingroup$ The minimum complexity of a sorting algorithm is $O(n \log n)$, so this cannot be improved. $\endgroup$ Commented Oct 18, 2022 at 17:58
  • $\begingroup$ Why not do an $O(n)$ Knuth sort first? That's an easy way to avoid unbalanced trees. $\endgroup$
    – John Douma
    Commented Oct 18, 2022 at 21:11
  • $\begingroup$ @vvg: I intuitively can see that, but would like to prove it. Is the recurrence that I wrote correct? $\endgroup$
    – joeren1020
    Commented Oct 18, 2022 at 22:18

1 Answer 1

1
$\begingroup$

The recursion written was imprecise, because it did not take into account the possibility of rejection. Once the pivot is accepted, it is very likely to be in the middle third of all items (up to a small error). This leads to the following recursion:

$$ T(n) \leq 3\log n + ( \mathcal O(n) + \frac{3}{n} \sum^{2n/3}_{i=n/3}(T(i) + T(n-i))).$$

The solution is still $O(n \log n)$ but a better leading constant is obtained. For an analysis of related strategies see https://people.eecs.berkeley.edu/~tygar/papers/Optimal_sampling_strategies_for_quicksort/Journal.pdf and the references therein.

$\endgroup$
7
  • $\begingroup$ Where does the $3$ multiplying the first $\log n$ come from? Also, could do you please elaborate on why the first recurrence I wrote didn't consider the possibility of rejection? $\endgroup$
    – joeren1020
    Commented Oct 28, 2022 at 20:53
  • 1
    $\begingroup$ You need three attempts, on average, to get a pivot in the middle third of the sample. In your recursion the summation was over all $i$, but if the pivot is not near the middle third, it is very likely to be rejected during the comparison to the sample of $\log n$ items. $\endgroup$ Commented Oct 29, 2022 at 5:47
  • $\begingroup$ Shouldn't it be 2 instead of 3? I think the number of tries until a success should follow a geometric distribution. Since the probability of success is $1/3$, then the average of this variable should be $(1- 1/3)/(1/3) = 2$. Moreover, I don't get why the probability of choosing a pivot, once it is accepted, should be $3/n$. Are you considering that all are equally likely to be chosen (once pivot is accepted), or they have different probabilities but can be bounded by $3/n$ (if so, how?)? $\endgroup$
    – joeren1020
    Commented Nov 18, 2022 at 19:46
  • 1
    $\begingroup$ A geometric distribution that starts at 1 and has parameter $p$, has mean $1/p$. Check Wikipedia. The factor of $3/n$ comes from averaging over $n/3$ possibilities. This is an approximation, items on the boundary have a lower probability, but this approximation has a small effect on the final result. $\endgroup$ Commented Nov 19, 2022 at 2:31
  • $\begingroup$ Thanks, that's been really helpful. I'm trying to formalize the idea behind the approximation. For example, I want to convince myself that, in fact, the probability is mostly concentrated between $n/3$ and $2n/3$. I've computed the exact probability which I think should be $\frac{3}{k}\sum^{2k/3}_{j=k/3 + 1}\frac{{i-1\choose j-1}{n-i\choose k-j}}{n\choose k}$, where $k = \log n$. To see how much is the probability between $n/3$ and $2n/3$ we would need to compute $\sum^{2n/3}_{i=n/3}p_i$ which seems challenging because of the combinatorial sums. How is one supposed to deal with them? $\endgroup$
    – joeren1020
    Commented Nov 24, 2022 at 21:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .