1

Per A. Fog's instruction tables, an Ivy Bridge has a 3 cycle latency on a MOV instruction.

So the following will take 3 cycles to move RAX into the address in RCX:

  mov               [rcx], rax

My question is, does this imply that RAX, which is being read, cannot be modified for the next 2 clocks? Specifically, would the following cause an execution delay:

  mov               [rcx], rax
  inc               rax

1 Answer 1

1

In short, yes. That will cause the processor to stall while it waits for that instruction to complete and data to be available before the next instruction can be run. There is no way to easily predict what data will arrive and so that inc instruction simply cannot run until the mov is complete.

That may not be a big problem though as the processor may well be able to schedule instructions that are not dependant on the result of that mov instruction in order to keep the core working.

This is known as Out-of-order execution and it can help mitigate the cost of processor stalls when waiting for long instructions such as these.


A further clarification...

I should have read your example better, I do not believe that the mov [rcx], rax instruction will cause a stall on the inc rax instruction, but will cause anything dependant on rcx to stall.

The page you linked lists reciprocal throughput whereby another instruction of that type can be issued. Specifically I would assume in that length of time any instruction with similar dependencies could be issued.

Thus I would assume that the RAX register is either renamed as the instruction is sent for execution or is encoded in the u-ops for the instruction. The next instruction can work on that register so long as it is not dependant on the results of a previous operation being stored in that register.

So in your questions example what I believe should happen is that the CPU effectively has two instructions whose only dependency is the current value of the RAX register and the value in it is only modified by the second instruction. The first instruction should be dispatched and almost immediate execution can begin on the second (inc) instruction.

4
  • I take it that in {movq xmm1, [r10+rax*8]} RAX would also be tied up for 3 cycles, or would this be different since RAX would be used on cycle 1 to determine the address and the next 2 cycles is the fetch & store?
    – IamIC
    Commented Sep 3, 2015 at 7:06
  • 2
    @IanC I think I see where you are going. Whether RAX is tied up (and thus holds up the inc) for the entire period is unknowable without some rather specific architectural knowledge which I don't have. The dispatcher could easily encode the value stored in RAX in the u-ops for the movq instruction and thus free up the register for (nearly) immediate use by the inc. The page also lists reciprocal throughput whereby another instruction of that type can be issued and I would assume in that same length of time an instruction with similar dependencies could be issued.
    – Mokubai
    Commented Sep 3, 2015 at 14:27
  • It's certainly a tricky question. The relationship between latency and reciprocal throughput is complex. At this point, I'd say testing is the only way to really know. But test something this low level is likely to be untrivial.
    – IamIC
    Commented Sep 3, 2015 at 14:39
  • @IanC as you say it is not exactly trivial to find out, there are lot of highly advanced features at play that would determine whether the instruction would block further execution, and if so for how long. I've edited my answer to reflect what I believe is the most logical outcome, but the only examples I can find on this are regarding operations on differently named registers, not two instructions that only rely on the current state of the register.
    – Mokubai
    Commented Sep 3, 2015 at 15:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .