How again do certain sorting methods use $o(n \log n)$ time?

Question

I hope this question isn't too 'soft' for here. It's been a while $\tiny{\text{an eternity for some people's standards}}$ since I've touched this stuff, and I had a convincing explanation to this question eight minutes ago but must have misplaced it, so please help me find it again:

Most classical sorting algorithms one reads about are comparison sorts, in which a single comparison produces a binary outcome (let's break ties arbitrarily). The motivation for this I remember is that they are general, requiring only that an ordering be prescribed over the items, and insensitive to how the items are encoded, but suffer from a $\Omega(n \log n)$-time barrier simply for reason of needing to do enough comparisons to distinguish the $n!$ possibilities. One then meets algorithms such as radix sort, which is supposed to beat this under caveats such as the space of items being discrete and bounded, and assuming the ability to retrieve any decimal/etc. place in constant time. But as I recently thought about it, this operation is essentially a $b$-way switch as opposed to the (ordinary) $2$-way kind used in comparison sorts. Then radix sort would be subject to the same lower bound on time taken since the foregoing gives an improvement of $(\log b)/(\log 2)$ (at most?). So how is radix sort ever preferable to one of the standard comparison sorts? I know this is a contradiction, but not how so.

In my experience this kind of paradox typically dissolves when formalised. I would rather have an intuitive reason that explains it away if such is available, though.

for sorting, it cant be understood very well until one understands very distinctly the difference between average and worst case, which this question seems to gloss over/ mix up somewhat.... many sort algorithms are generally $O(n \log n)$ only in average case... — vzn, Commented Dec 12, 2014 at 6:08
I think he means why certain algorithm can sort in little $o$ of $n\log n$ even though it is a requirement that sorting requires $\Omega (n\log n)$ — InformedA, Commented Dec 12, 2014 at 6:16

Yuval Filmus · Accepted Answer · 2014-12-12 01:59:19Z

Radix sort cannot be reduced to a comparison-based sort. If the alphabet is binary, then in some sense you are doing 2-way comparisons, since each bit is either 0 or 1. If there are $n$ numbers of width $w$, radix sort takes time $O(nw)$. Therefore as long as $w \ll \log n$, radix sort will be faster than any comparison-based sort.

What does the condition $w \ll \log n$ mean? If numbers have width $w$ then they are in the range $0$ to $2^w-1$. When $w \ll \log n$, there are less than $n$ possible numbers, so some numbers repeat. In this regime you can beat comparison-based sorting.

Another algorithm beating comparison-based sorting in the regime in which there are less than $n$ possible number is counting sort. Suppose that the domain is of size $m$. We initialize an array of size $m$, and count how many elements of each type there are. We can then unpack this histogram into the sorted array. This runs in time $O(n+m)$. If $m \ll n$ then this algorithm runs in optimal $O(n)$ time.

In both of these cases, the algorithms cannot be rephrased in the decision-tree model (try doing it!), and so aren't subject to the limitations of comparison-based sorting. No paradoxes here.

If keys repeat, then comparison-based sorting can be more efficient, too. The theoretical complexity of comparison-based sorting is $O(nH)$ where $H$ is the zero-order entropy of the sort keys. If the keys are distinct, $H = \log n$, but if they are not, $H < \log n$. — Pseudonym, Commented Dec 13, 2014 at 9:23
I liked all these views, and in some sense the one above this one is my favorite. Still I've decided to accept this one since it helped me diagnose the problem (I had thought of the allowable range being bounded but not quantitatively on the same footing as $n$, and radix-sort queries and comparisons are the same except how they really aren't). Thanks again for all your answers. — Vandermonde, Commented Dec 13, 2014 at 21:23

InformedA · Accepted Answer · 2014-12-12 06:02:46Z

There is no paradox because the cost definition is different in the two cases. Since you want intuition, one can think about it as counting the cost of going on a trip by the money you pay or the distance you travel or the time you travel. They are different units of measurement.

In comparison sorting, we assume that the cost of comparing 2 numbers is a fixed constant. This is just a simplified assumption for theoretical purposes. Depending on different machine implementations and the way numbers are encoded, it may take more than a constant amount of cost to comparing two numbers.

Think about an implementation where each number is encoded by lot of bits. In this case, the possible numbers in the encoding scheme are bounded by a large area. Clearly, you will need more than 1 unit of cost to do comparison. I believe this is decided by the number of bits in the computer's registers for modern computers.

Even if, as you mentioned, you can do multiple-way comparison, you will need to put the compared numbers in different buckets. If your $b$ in comparison way is too big, you will need to have many buckets at each run to account for how many different buckets a number can fall into.

The running time of radix sort will then be:

Create 2 sets of $b$ buckets
Run $n$ numbers for each run

CORRECTION: The number of runs is $\frac{N}{n \log b}$ where $N$ is the input size in your encoding scheme. So the total cost is: $\frac{N\ n}{n \log b} = \frac{N}{\log b}$. You can see now that there is no $n$ in this formula. You can't theoretically compare radix sort with comparison sorts.

Now, you can see where I am going. Radix sort will be preferable when your numbers are bounded (ie, their encoding is not too long). The less the encoding is, the less runs you need, the faster the sort will be.

Jake · Accepted Answer · 2014-12-12 01:29:58Z

It is preferable if the length of the inputs are less than the the log of the size of the array. For instance if I have 256 buckets and I break integers up into 4 bytes then I need to run 4 passes to sort them properly however $log 16=4$ so there is a potential advantage for arrays of length greater than 16. But really all I've said here if we translate this to asymptotic terms is that there is some size for which radix sort will overtake merge sort. I once wrote a radix sort algorithm in C++ that did just the above. I found it was comparable for inputs less than 10000 but after that started to beat the standard libgcc std::sort. On an array of size 1000000 it beat it quite well. For small inputs (like inputs less than 1000) std::sort seemed to cream my radix sort no matter what I did. I believe this had to do with the fact that insertion sot was being used on the smaller sections.

Comparison based sorting assumes that n keys can be arranged in n! different ways. But if you have a million 4 bit keys, then many, many permutations actually give the same array, so the number of ways to arrange the keys is much less than n! — gnasher729, Commented Jan 3, 2023 at 12:53

Pseudonym · Accepted Answer · 2014-12-13 09:20:39Z

Let's assume for the moment that the sort keys are all distinct, and think about the information theory of sorting.

An unsorted array is a permutation of the sorted array. There are $n!$ possible permutations. The job of sorting, in an information theoretic sense, is to discover precisely which permutation it is.

To transmit a number between $1$ and $m$ requires transmitting $\log_2 m$ bits of information. To transmit a permutation of $n$ elements, therefore, requires $\log_2 n!$ bits of information. By Stirling's approximation, this turns out to be $n \log_2 n + O(\hbox{low order})$ bits.

A binary comparison operation discovers one bit of information. It follows that any sorting algorithm which only uses a binary comparison operation must perform $O(n \log n)$ comparisons.

A radix sort beats this by discovering more than one bit per query. If you can discover $\Theta(\log n)$ bits per query, for example, then you can sort using $O(n)$ queries.

EDIT The key word in that last sentence is "if".

Suppose that we have $n$ distinct integers, and we wish to radix sort them using $k$ buckets.

Clearly, each query will discover $\log_2 k$ bits of information. It follows that any such sort must perform $\frac{n \log_2 n}{\log_2 k}$ queries (ignoring the low-order term).

If $k = 2$, then this has the same complexity as a comparison-based sort. However, if $k = \Theta(\log n)$, then sorting will be linear. In most cases, $k$ will be somewhere in between these two values.

I really like the entropy interpretation, and the radix-sort operations look non-redundant so as to simply get $O((n \log n)/(\log n))$ queries, but could you please elaborate on how each query yields on the order of $\log n$ bits as opposed to $\log b$? I'm having a really dense moment here. — Vandermonde, Commented Dec 13, 2014 at 3:47
It will be nice to see some more elaboration on the running time part near the end. I don't think I am sure about it. — InformedA, Commented Dec 14, 2014 at 23:57

Stack Exchange Network

How again do certain sorting methods use $o(n \log n)$ time?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
algorithms
sorting
efficiency
or ask your own question.

Linked

Hot Network Questions

How again do certain sorting methods use $o(n \log n)$ time?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged algorithmssortingefficiency or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
algorithms
sorting
efficiency
or ask your own question.