First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.