Why does hardware division take so much longer than multiplication on a microcontroller? E.g., on a dsPIC, a division takes 19 cycles, while multiplication takes only one clock cycle.
I went through some tutorials, including Division algorithm and Multiplication algorithm on Wikipedia. Here is my reasoning.
A division algorithm, like a slow division method with restoring on Wikipedia, is a recursive algorithm. This means that (intermediate) results from step k
are used as inputs to step k+1
, which means that these algorithms cannot be parallelized. Therefore, it takes at least n
cycles to complete the division, whereas n
is a number of bits in a dividend. For 16-bit dividends, this is equal to at least 16 cycles.
A multiplication algorithm doesn't need to be recursive, which means that it is possible to parallelize it. However, there are many different multiplication algorithms, and I don't have a clue which one may be used by microcontrollers. How does multiplication work on a hardware/microcontroller?
I've found a Dadda multiplier algorithm, which is supposed to take only one clock cycle to finish. However, what I don't get here is that Dadda's algorithm proceeds in three steps, whereas results from step 1 are used in step 2, etc. According to this, this would take at least three clock cycles to finish.