Difference between MOVDQA and MOVAPS x86 instructions?

Question

I'm looking Intel datasheet: Intel® 64 and IA-32 Architectures Software Developer’s Manual and I can't find the difference between

MOVDQA: Move Aligned Double Quadword
MOVAPS: Move Aligned Packed Single-Precision

In Intel datasheet I can find for both instructions:

This instruction can be used to load an XMM register from a 128-bit memory location, to store the contents of an XMM register into a 128-bit memory location, or to move data between two XMM registers.

The only difference is:

To move a double quadword to or from unaligned memory locations, use the MOVDQU instruction.

and

To move packed single-precision floating-point values to or from unaligned memory locations, use the MOVUPS instruction.

But I can't find the reason why two different instructions?

So can anybody explain the difference?

Note that movaps has smaller machine code (3 bytes minimum): movdqa needs an extra prefix so it's at lest 4 bytes. — Peter Cordes, Commented Sep 13, 2022 at 16:13

Stephen Canon · Accepted Answer · 2011-07-13 13:19:06Z

62

In functionality, they are identical.

On some (but not all) micro-architectures, there are timing differences due to "domain crossing penalties". For this reason, one should generally use movdqa when the data is being used with integer SSE instructions, and movaps when the data is being used with floating-point instructions. For more information on this subject, consult the Intel Optimization Manual, or Agner Fog's excellent microarchitecture guide. Note that these delays are most often associated with register-register moves instead of loads or stores.

edited Jul 13, 2011 at 13:19

answered Jul 13, 2011 at 11:54

Stephen Canon

105k20 gold badges187 silver badges272 bronze badges

2

Could you link to specific manual entries? I'm having a hard time believing this because SSE registers don't have a type associated with them (the type is encoded in the instructions) therefor I don't think there are different float & integer paths. They do, however, have different op-codes and are introduced in different instruction-sets. MOVAPS is SSE1 while MOVDQA is SSE2. They also both have the same latency & throughput according to intel.com/Assets/PDF/manual/248966.pdf
– Jasper Bekkers
Commented Jul 13, 2011 at 13:27
25

@Jasper Bekkers: You can not believe it all you like, but it's still true. For a general discussion of domains and the bypass delays between them, see the Intel Optimization Manual (2.2.3 discusses domains on the Nehalem micro architecture, for example). For a concrete, specific example of the hazard, see pages 86 and 87 of Agner Fog's excellent reference agner.org/optimize/microarchitecture.pdf
– Stephen Canon
Commented Jul 13, 2011 at 14:00
3

The pages in Agners manual seem to have changed, best just search for "Data bypass delays", there's a section per each uArch.
– Leeor
Commented Nov 2, 2015 at 7:15
2

What about movaps vs. movapd? They are both in the floating point domain so I don't see why there are two instructions.
– Z boson
Commented Dec 18, 2015 at 7:47
8

@Zboson: to reserve the possibility of introducing separate float / double domains in the future. This will almost certainly never happen, but some architect thought it might many years ago.
– Stephen Canon
Commented Dec 18, 2015 at 17:42

| Show 1 more comment

Collectives™ on Stack Overflow

Difference between MOVDQA and MOVAPS x86 instructions?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
assembly
x86
sse
simd
mov
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged assemblyx86ssesimdmov or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
assembly
x86
sse
simd
mov
or ask your own question.