It's much harder to develop really nefarious data races with a single CPU. I mean, sure, you can pull off tearing between words if you interrupt a single CPU, but can you build exotic scenarios where there is no single interleaving of threads which does what you want?
Okay, maybe making insidious bugs doesn't count as a valid use of multi-code advancements. As it turns out, there's not much that mutli-core can do that single core cannot given time. The reason is simple. If you try to avoid those evil data races, you have to have synchronization points in your code. If you model your code as a lattice of computations where ones inputs must be complete and synchronized before you can calculate and produce outputs, it's easy to see that a single CPU can simply work their way along the lattice, calculating the next available block of work.
In fact, if you can demonstrate that your algorithm can be solved by a Turing machine (which is virtually every algorithm we care about), it can be proven that the algorithm can be done by not only a single core CPU, but in fact a state machine with a very long piece of tape for memory!
The CHESS race detector actually leverages this to find race cases. It runs everything singlethreaded and systematically explores all possible interleaves between threads, trying to find cases where a test fails because of a race case. CHESS depends on the fact that you can run any multithreaded application on a single core.
The cases where you need multicore appear when you start stretching the limits of hardware. The obvious one is when you have time constraints. Some problems with realtime time constraints are impossible to do single core because they simply can't drive a single core's clock fast enough. There's a reason CPUs climbed up to 4Ghz and then settled down a bit, preferring more cores at lower speeds.
A more exotic version of this timing constraint is in hard-real time systems. In some hard real time systems, the service of interrupts is so demanding that you actually have to pick a multi-core CPU that lets you divvy the interrupts up across the cores, or you run into timing limitations.
Another limit arises with data busses. Consider the Blue Gene/P as an example. JUGENE, a particular Blue Gene/P supercomputer, has 144 terabytes of memory. They simply don't make single CPU computers that can access all of that memory.