
I currently try to design an engine at the core level. However, I cannot decide if input thread should be mixed with a rendering thread or any other thread. Would making it an extra thread with all the locks it will require be slower? If it's same or faster, would at least be marginally faster?

[Consider the above only an example to the concept.]

There are a couple questions you need to address here:

  1. How much work does the input thread need to do?
  2. Are there a lot of input-rendering iterations?

If there are tons of iterations and the input process takes a long time, then pipelining the tasks as two separate threads may speed-up your application.

If there isn't a massive influx of data to handle, or if the input process isn't very taxing, then splitting the operations probably won't justify the synchronization costs.


Basically, when parallelizing, the optimum number of threads == number of cores. This is because then each core runs exactly one thread; all cores are doing work, but there's no need to switch the threads all the time (scheduling).

In practice, this is of course a gross over-simplification. Just to start with the fact that your program is not alone; there are a number of other processes and threads around to share the CPU. Then, your problem may not be (at least completely) CPU bound. And how you divide the work between threads will make huge differences. So this boils down to measuring. Make the number of threads adjustable, if possible. Then try it out. On different computers. It will be interesting :-)

In general, problems with fine grained locking need many threads (on many cores) to break even, problems with coarse grained locking will scale about linearly (up to a point).

Locking is the biggest overhead with multiple threads, so problems with no locks, like rendering if done right, benefit most from extra threads.

In summary, it depends. Try to reduce the number of times you need to lock mutexes to do actions and the number of threads you need to break even will decrease.


You may consider a simple queuing system to allow for the input/output operations to working in a disconnected way. Then your render thread would be able to update and paint while still waiting on user/system input. This would also allow you to push data processing to yet another queue to reduce contention more.


The number of optimal threads depends on the application and load. Too many threads and you could run into an issue with deadlocking and switching contention. Too few and you become I/O bound.

  • Deadlock - Resource contention causes the application to run slower due to different threads fighting over limited resources (I/O or Data)

  • Switching Contention - The amount of time the applications (or operating system) swaps/preempts threads is greater than the time spent processing I/O or Data

  • I/O Bound - Threads ending up in a blocked/wait state while resources are in use.

On modern hardware switching contention probably isn't an issue unless you prioritize your threads in a bad way. (Give long blocking threads higher priority.)

I/O bound is typically less of an issue than deadlocks and may typically be reduced by increasing the number of available threads.

Deadlocks are by far the hardest to solve. The simplest way would probably be to use immutable data structures and preempt a call to shared resources that would cause a deadlock. The reason they are so hard to fix is that different parts of the resources are blocked by different threads and could require each of the threads to toss out their current data and restart. (These restarts also cause the application to waste time reprocessing data.)

