0

First of all, what is the maximum theoretical speed/speed up?

Can anyone explain why pipelining cannot operate at its maximum theoretical speed?

2
  • 1
    hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out. Commented Dec 31, 2018 at 21:38
  • The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
    – Hadi Brais
    Commented Jan 1, 2019 at 1:00

1 Answer 1

2

The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.

The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.

Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).

As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.

Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).

Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.

Not the answer you're looking for? Browse other questions tagged or ask your own question.