88

I was thinking about sorting algorithms in software, and possible ways one could surmount the O(nlogn) roadblock. I don't think it IS possible to sort faster in a practical sense, so please don't think that I do.

With that said, it seems with almost all sorting algorithms, the software must know the position of each element. Which makes sense, otherwise, how would it know where to place each element according to some sorting criteria?

But when I crossed this thinking with the real world, a centrifuge has no idea what position each molecule is in when it 'sorts' the molecules by density. In fact, it doesn't care about the position of each molecule. However it can sort trillions upon trillions of items in a relatively short period of time, due to the fact that each molecule follows density and gravitational laws - which got me thinking.

Would it be possible with some overhead on each node (some value or method tacked on to each of the nodes) to 'force' the order of the list? Something like a centrifuge, where only each element cares about its relative position in space (in relation to other nodes). Or, does this violate some rule in computation?

I think one of the big points brought up here is the quantum mechanical effects of nature and how they apply in parallel to all particles simultaneously.

Perhaps classical computers inherently restrict sorting to the domain of O(nlogn), where as quantum computers may be able to cross that threshold into O(logn) algorithms that act in parallel.

The point that a centrifuge being basically a parallel bubble sort seems to be correct, which has a time complexity of O(n).

I guess the next thought is that if nature can sort in O(n), why can't computers?

24
  • 45
    Centrifuge is just a massively parallel bubble sort implementation, nothing fancy. Commented Jan 11, 2017 at 7:41
  • 3
    When having n processors (cores) to sort out an array of just n items you can easily achieve O(n) complexity. A bitter truth is we usually have to sort long arrays (thousands and millions of items) on CPU with 2..10 cores only. Commented Jan 11, 2017 at 8:02
  • 24
    Note that the n log n is the number of comparisons that must be made in a sort that compares pairs of items. There is no requirement that a sort algorithm compare pairs of items; if you can come up with a sort that does not do pairwise comparisons, you can make it faster than n log n. Commented Jan 11, 2017 at 9:58
  • 7
    The thing you're missing is that each of those molecules in the solution are processing units. There's no emulator that counts the molecules - the molecules count themselves. An analogous computer would have as many processor cores and independent memories as you have items to sort. O(n) on its own tells you nothing - it's only useful for comparing algorithms with similar constraints and running on similar architectures; in introductory courses for algorithmic complexity we use a very simplified model "computer" that has little to do with centrifuges or real computers :)
    – Luaan
    Commented Jan 11, 2017 at 10:02
  • 4
    I'm voting to close this question as off-topic because it belongs on cs.stackexchange.com Commented Jan 11, 2017 at 15:25

12 Answers 12

71

EDIT: I had misunderstood the mechanism of a centrifuge and it appears that it does a comparison, a massively-parallel one at that. However there are physical processes that operate on a property of the entity being sorted rather than comparing two properties. This answer covers algorithms that are of that nature.

A centrifuge applies a sorting mechanism that doesn't really work by means of comparisons between elements, but actually by a property ('centrifugal force') on each individual element in isolation.Some sorting algorithms fall into this theme, especially Radix Sort. When this sorting algorithm is parallelized it should approach the example of a centrifuge.

Some other non-comparative sorting algorithms are Bucket sort and Counting Sort. You may find that Bucket sort also fits into the general idea of a centrifuge (the radius could correspond to a bin).

Another so-called 'sorting algorithm' where each element is considered in isolation is the Sleep Sort. Here time rather than the centrifugal force acts as the magnitude used for sorting.

5
  • This is actually the right answer - bin sorting / radix sorting has O(n) complexity provided the bins and input can be accessed in O(1) time.
    – pjc50
    Commented Jan 11, 2017 at 9:59
  • 5
    I was going to ask "Does anyone else think of Sleep Sort right away?". Appearantly, yes :)
    – CompuChip
    Commented Jan 11, 2017 at 16:55
  • Centrifuges do work by comparing elements; the hash function is (primarily) density. For example, if you centrifuge a propane-and-air mixture, you'll get propane sorted to the boundaries; but if you centrifuge propane-and-water, you'll get propane sorted to the center (water's more dense). This process is almost exactly the same as the physical process that a "bubble sort" was named after.
    – Nat
    Commented Jan 11, 2017 at 22:29
  • Isn't the complexity of SleepSort actually relying on that of the scheduler?
    – Morwenn
    Commented Jan 24, 2017 at 12:28
  • @Morwenn the old linux scheduler was O(1) while the new one is O(log n). Both of these are outweighed by the constant factors in sleep Commented Jan 24, 2017 at 17:32
35

Computational complexity is always defined with respect to some computational model. For example, an algorithm that's O(n) on a typical computer might be O(2n) if implemented in Brainfuck.

The centrifuge computational model has some interesting properties; for example:

  • it supports arbitrary parallelism; no matter how many particles are in the solution, they can all be sorted simultaneously.
  • it doesn't give a strict linear sort of particles by mass, but rather a very close (low-energy) approximation.
  • it's not feasible to examine the individual particles in the result.
  • it's not possible to sort particles by different properties; only mass is supported.

Given that we don't have the ability to implement something like this in general-purpose computing hardware, the model may not have practical relevance; but it can still be worth examining, to see if there's anything to be learned from it. Nondeterministic algorithms and quantum algorithms have both been active areas of research, for example, even though neither is actually implementable today.

1
  • Nature / physics is parallel in general (that's why it's so computationally expensive to simulate on our serial computers), so yeah, the OP's analogy has a major flaw. It still takes time for particles / molecules to move along the length of a test tube or whatever, though, so a longer test tube is like more work per thread, but a wider test tube is more parallelism. (And note that a centrifuge doesn't sort across the area of a test tube, so it's many parallel sorts with no merging but maybe some interaction along the way. Unlike an actual sort on a parallel computer, with final merging) Commented May 6, 2018 at 1:39
29

The trick is there, that you only have a probability of sorting your list using a centrifuge. As with other real-world sorts [citation needed], you can change the probability that your have sorted your list, but never be certain without checking all the values (atoms).

Consider the question: "How long should you run your centrifuge for?"
If you only ran it for a picosecond, your sample may be less sorted than the initial state.. or if you ran it for a few days, it may be completely sorted. However, you wouldn't know without actually checking the contents.

6
  • That's a pretty good point. How do you know? Then again, if the rules in place are good enough, would you even care to know? (ie. if you make the probability so low that it becomes negligible).
    – Mmm Donuts
    Commented Jan 11, 2017 at 7:13
  • You can always calculate how long it would take for a particle to reach the end of the centrifuge. You know the acceleration (w^2 * r where w is angular velocity) and you can calculate the time. Commented Jan 11, 2017 at 7:37
  • 1
    True, but as that is confounded by brownian motion, other atomic forces, and quantum physics (thanks, tiny things!), you still cannot be completely certain you have sorted your list until you check the state.
    – ti7
    Commented Jan 11, 2017 at 7:45
  • 1
    If you don't have extremely small particles, you can ignore quantum effects. If you have extremely small particles, the sorting algorithm need not work, and in fact, you can't depend on it to work due to quantum effects. And you cannot check state reliably due to the uncertainty principle (checking one particle will lead to other particles getting moved). Commented Jan 11, 2017 at 7:50
  • 1
    @Kris Well, we do know that the centrifuge doesn't sort perfectly. We just keep doing it until the difference no longer matters for the practical purpose - like preventing blood clotting in your blood centrifuge. But look at uranium centrifuges - those need to sort items that are much "closer" (harder to separate), and require a huge facility that keep sorting over and over and over again at huge expense to produce tiny amounts of the desired material. And the centrifuge has a certain size, and the separation time is proportional to the width of the tubes, and... You can't just say O(n), yay!
    – Luaan
    Commented Jan 11, 2017 at 9:54
5

A real world example of a computer based "ordering" would be autonomous drones that cooperatively work with each other, known as "drone swarms". The drones act and communicate both as individuals and as a group, and can track multiple targets. The drones collectively decide which drones will follow which targets and the obvious need to avoid collisions between drones. The early versions of this were drones that moved through way points while staying in formation, but the formation could change.

For a "sort", the drones could be programmed to form a line or pattern in a specific order, initially released in any permutation or shape, and collectively and in parallel they would quickly form the ordered line or pattern.

Getting back to a computer based sort, one issue is that there's one main memory bus, and there's no way for a large number of objects to move about in memory in parallel.

know the position of each element

In the case of a tape sort, the position of each element (record) is only "known" to the "tape", not to the computer. A tape based sort only needs to work with two elements at a time, and a way to denote run boundaries on a tape (file mark, or a record of different size).

4

IMHO, people overthink log(n). O(nlog(n)) IS practically O(n). And you need O(n) just to read the data.

Many algorithms such as quicksort do provide a very fast way to sort elements. You could implement variations of quicksort that would be very fast in practice.

Inherently all physical systems are infinitely parallel. You might have a buttload of atoms in a grain of sand, nature has enough computational power to figure out where each electron in each atom should be. So if you had enough computational resources (O(n) processors) you could sort n numbers in log(n) time.

From comments:

  1. Given a physical processor that has k number of elements, it can achieve a parallelness of at most O(k). If you process n numbers arbitrarily, it would still process it at a rate related to k. Also, you could formulate this problem physically. You could create n steel balls with weights proportional to the number you want to encode, which could be solved by a centrifuge in a theory. But here the amount of atoms you are using is proportional to n. Whereas in a standard case you have a limited number of atoms in a processor.

  2. Another way to think about this is, say you have a small processor attached to each number and each processor can communicate with its neighbors, you could sort all those numbers in O(log(n)) time.

6
  • But isn't computation just that - using the physical properties of nature to do some work? I might be crossing into quantum computing here, but if it can be done physically, it should be able to be done computationally? Perhaps classical computation is the road block between O(nlogn) and O(logn).
    – Mmm Donuts
    Commented Jan 11, 2017 at 7:32
  • 2
    @Kris Not exactly. Given a physical processor that has k number of elements, it can achieve a parallelness of at most O(k). If you process n numbers arbitrarily, it would still process it at a rate related to k. Also, you could formulate this problem physically. You could create n steel balls with weights proportional to the number you want to encode, which could be solved by a centrifuge in a theory. But here the amount of atoms you are using is proportional to n. Whereas in a standard case you have a limited number of atoms in a processor.
    – ElKamina
    Commented Jan 11, 2017 at 7:36
  • Does that limit also apply to QM objects? Just out of curiosity
    – Mmm Donuts
    Commented Jan 11, 2017 at 7:39
  • 1
    @Kris I do not understand QM in enough depth to answer it.
    – ElKamina
    Commented Jan 11, 2017 at 7:40
  • No worries! I'm just very curious and can't seem to sleep haha. Thank you for the interesting answers.
    – Mmm Donuts
    Commented Jan 11, 2017 at 7:42
4

I worked in an office summers after high school when I started college. I had studied in AP Computer Science, among other things, sorting and searching.

I applied this knowledge in several physical systems that I can recall:

Natural merge sort to start…

A system printed multipart forms including a file-card-sized tear off, which needed to be filed in a bank of drawers.

I started with a pile of them and sorted the pile to begin with. The first step is picking up 5 or so, few enough to be easily placed in order in your hand. Place the sorted packet down, criss-crossing each stack to keep them separate.

Then, merge each pair of stacks, producing a larger stack. Repeat until there is only one stack.

…Insertion sort to complete

It is easier to file the sorted cards, as each next one is a little farther down the same open drawer.

Radix sort

This one nobody else understood how I did it so fast, despite repeated tries to teach it.

A large box of check stubs (the size of punch cards) needs to be sorted. It looks like playing solitaire on a large table—deal out, stack up, repeat.

In general

30 years ago, I did notice what you’re asking about: the ideas transfer to physical systems quite directly because there are relative costs of comparisons and handling records, and levels of caching.

Going beyond well-understood equivalents

I recall an essay about your topic, and it brought up the spaghetti sort. You trim a length of dried noodle to indicate the key value, and label it with the record ID. This is O(n), simply processing each item once.

Then you grab the bundle and tap one end on the table. They align on the bottom edges, and they are now sorted. You can trivially take off the longest one, and repeat. The read-out is also O(n).

There are two things going on here in the “real world” that don’t correspond to algorithms. First, aligning the edges is a parallel operation. Every data item is also a processor (the laws of physics apply to it). So, in general, you scale the available processing with n, essentially dividing your classic complexity by a factor on n.

Second, how does aligning the edges accomplish a sort? The real sorting is in the read-out which lets you find the longest in one step, even though you did compare all of them to find the longest. Again, divide by a factor of n, so finding the largest is now O(1).

Another example is using analog computing: a physical model solves the problem “instantly” and the prep work is O(n). In principle the computation is scaling with the number of interacting components, not the number of prepped items. So the computation scales with n². The example I'm thinking of is a weighted multi-factor computation, which was done by drilling holes in a map, hanging weights from strings passing through the holes, and gathering all the strings on a ring.

3
  • The spaghetti sort was a fun read. I enjoyed thinking about it, but I criticize the action of scanning for the longest noodle. This isn't really a O(1) operation since you scan the noodles. Imagine ten thousand noodles and a few that are similar in length... it's not a O(1) "eyeball it" operation. In reality, one must scan all the unsorted noodles to find the longest.
    – ThisClark
    Commented Jan 12, 2017 at 1:17
  • You can “scan” all the noodles by laying your palm across the whole bunch and pulling off the one tallest noodle that comes in contact with your hand. If the noodles are very close in length, use a more precise “hand” surface to grab the tallest noodle. The noodles are not selected serially like with a selection sort, they are selected all at once so there is O(n) “computing” power available. Commented Jan 12, 2017 at 1:47
  • 1
    @ThisClark you need a more precise jig: a flat plane parallel to the stop on the bottom that aligns the noodles. Carefully lower it until one noodle (the tallest) is touched and placed under compression. The comparison of the height of the plane against every noodle is done in parallel by that noodle. You are suggesting that a higher coefficient is needed, but that argument does not change the Big-O.
    – JDługosz
    Commented Jan 12, 2017 at 7:47
3

Sorting is still O(n) total time. That it is faster than that is because of Parallelization.

You could view a centrifuge as a Bucketsort of n atoms, parallelized over n cores(each atom acts as a processor).

You can make sorting faster by parallelization but only by a constant factor because the number of processors is limited, O(n/C) is still O(n) (CPUs have usually < 10 cores and GPUs < 6000)

2

The centrifuge is not sorting the nodes, it applies applies a force to them then they react in parallel to it. So if you were to implement a bubble sort where each node is moving itself in parallel up or down based on it's "density", you'd have a centrifuge implementation.

Keep in mind that in the real world you can run a very large amount of parallel tasks where in a computer you can have a maximum of real parallel tasks equals to the number of physical processing units.

In the end, you would also be limited with the access to the list of elements because it cannot be modified simultaneously by two nodes...

1

Would it be possible with some overhead on each node (some value or method tacked on to each of the nodes) to 'force' the order of the list?

When we sort using computer programs we select a property of the values being sorted. That's commonly magnitude of the number or the alphabetical order.

Something like a centrifuge, where only each element cares about its relative position in space (in relation to other nodes)

This analogy aptly reminds me of simple bubble sort. How smaller numbers bubble up in each iteration. Like your centrifuge logic.

So to answer this, don't we actually do something of that sort in software based sorting?

1
  • 1
    I think you're right. I think where I lost my analogy here is that I forgot that each molecule acts in parallel. So, it would be like a parallel bubble sort...
    – Mmm Donuts
    Commented Jan 11, 2017 at 7:17
1

First of all, you are comparing two different contexts, one is logic(computer) and the other is physics which (so far) is proven that we can model some parts of it using mathematical formulas and we as programmers can use this formulas to simulate (some parts of) physics in the logic work (e.g physics engine in game engine).

Second We have some possibilities in the computer (logic) world that is nearly impossible in physics for example we can access memory and find the exact location of each entity at each time but in physics that is a huge problem Heisenberg's uncertainty principle.

Third If you want to map centrifuges and its operation in real world, to computer world, it is like someone (The God) has given you a super-computer with all the rules of physics applied and you are doing your small sorting in it (using centrifuge) and by saying that your sorting problem was solved in o(n) you are ignoring the huge physics simulation going on in background...

1

Consider: is "centrifuge sort" really scaling better? Think about what happens as you scale up.

  • The test tubes have to get longer and longer.
  • The heavy stuff has to travel further and further to get to the bottom.
  • The moment of inertia increases, requiring more power and longer times to accelerate up to sorting speed.

It's also worth considering other problems with centrifuge sort. For example, you can only operate on a narrow size scale. A computer sorting algorithm can handle integers from 1 to 2^1024 and beyond, no sweat. Put something that weighs 2^1024 times as much as a hydrogen atom into a centrifuge and, well, that's a black hole and the galaxy has been destroyed. The algorithm failed.

Of course the real answer here is that computational complexity is relative to some computational model, as mentioned in other answer. And "centrifuge sort" doesn't make sense in the context of common computational models, such as the RAM model or the IO model or multitape Turing machines.

0

Another perspective is that what you're describing with the centrifuge is analogous to what's been called the "spaghetti sort" (https://en.wikipedia.org/wiki/Spaghetti_sort). Say you have a box of uncooked spaghetti rods of varying lengths. Hold them in your fist, and loosen your hand to lower them vertically so the ends are all resting on a horizontal table. Boom! They're sorted by height. O(constant) time. (Or O(n) if you include picking the rods out by height and putting them in a . . . spaghetti rack, I guess?)

You can note there that it's O(constant) in the number of pieces of spaghetti, but, due to the finite speed of sound in spaghetti, it's O(n) in the length of the longest strand. So nothing comes for free.

1
  • That’s the same thing I said 11 hours earlier. And I went on to explain how physical systems let you divide by n or by n² and keep the model of algoritms and computation.
    – JDługosz
    Commented Jan 12, 2017 at 7:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.