This is largely dependent on the program.
Some common programming languages I'm familiar with (e.g. C, C++) default to single-threaded, requiring explicit steps to make a program capable of being run on multiple cores / CPUs.
I believe there are other programming languages that are written in a way to make them easily parallelizable (made to run on multiple cores / CPUs) by default.
You can run more than one iteration of a single-threaded program on a multi-core system at the same time, with each one on a different core, but that doesn't make it multi-threaded. It doesn't speed anything up to e.g. encrypt a single file multiple times at the same time. You just end up with multiple copies of the same data.
If you want to perform the same operation on multiple different inputs, then it might make sense to do something like this, and this is definitely taking advantage of multiple CPUs / cores for increased performance. The individual iterations are still single-threaded, but because they are completely independent, you can run as many as you like at one time.
You are right about some of the limitations of parallelism. For each step that depends on the output of a previous step, you have to run those two steps in sequence. You can probably create a program that uses multiple threads for that, but you don't get any performance gains, so it wouldn't make sense to go to extra effort in general.
Another item worth noting is that a single core can typically only run a single process / thread at a time (hyper-threading is a clever stretch of this rule). This is because a single thread requires the core to maintain context that is swapped in when the thread starts executing, and out when execution switches to a new thread. This is different from the inverse that you stated above.