19

Looking at some assembly code for x86_64 on my Mac, I see the following instruction:

48 c7 c0 01 00 00 00  movq    $0x1,%rax

But nowhere can I find a reference that breaks down the opcode. It seems like 48c7 is a move instruction, c0 defines the %rax register, etc.

So, where can I find a reference that tells me all that?

I am aware of http://ref.x86asm.net/, but looking at 48 opcodes, I don't see anything that resembles a move.

5
  • 3
    I've seen similar questions here. If I could find this on Google, I wouldn't have asked. The fact that I am aware of the reference I posted in my question also shows that I am not just too lazy to search myself.
    – Christoph
    Commented Jun 24, 2012 at 19:58
  • 3
    @Oded, googling for "x86 0x48 instruction prefix" is quite tricky if you don't know what you are looking for...
    – Griwes
    Commented Jun 24, 2012 at 19:59
  • @Oded I reworded my question to be more developer specific. Given the (really good!) reference at x86asm.net, I guess I just need to understand how that opcode is broken up. Griwes helped with that.
    – Christoph
    Commented Jun 24, 2012 at 20:01
  • 1
    If you didn't find the 0x48 at x86asm.net, that's because you didn't look right: ref.x86asm.net/coder64.html#x48 . -1. Commented Jun 24, 2012 at 22:50
  • I was looking for a mov. I know better now, thanks.
    – Christoph
    Commented Jun 25, 2012 at 15:35

2 Answers 2

26

Actually, mov is 0xc7 there; 0x48 is, in this case, a long mode REX.W prefix.

Answering also the question in comments: 0xc0 is b11000000. Here you can find out that with REX.B = 0 (as REX prefix is 0x48, the .B bit is unset), 0xc0 means "RAX is first operand" (in Intel syntax; mov rax, 1, RAX is first, or, in case of mov, output operand). You can find out how to read ModR/M here.

3
  • Thanks, that helps! Maybe I should reword my question.
    – Christoph
    Commented Jun 24, 2012 at 19:59
  • What about the c0? Where does that come in?
    – Christoph
    Commented Jun 24, 2012 at 20:12
  • @Christoph, added explanation in answer.
    – Griwes
    Commented Jun 24, 2012 at 20:19
5

When you look at the binary

 48 c7 c0 01 00 00 00

you need to disassemble it in order to understand its meaning.

The algorithm for disassembling is not difficult, but it's complex. It supposes looking up multiple tables.

The Algorithm is described in the 2nd volume of Intel Developer Manual,

Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 2 (2A, 2B & 2C):
Instruction Set Reference, A-Z

You start reading from the chapter called INSTRUCTION FORMAT.

Or, there are good books which dedicate whole chapters on this topic, such as

  X86 Instruction Set Architecture, Mindshare, by Tom  Shanley.

A whole chapter is dedicated to disassembling binary X86.

Or you can start reading the general algorithm from a manual for the same language made by AMD:

AMD64 Architecture
Programmer’s Manual
Volume 3:
General-Purpose and System Instructions

Here, in the chapter Instruction Encoding you will find the automaton that defines this language of instructions, and from this graphical scheme you can write easily the decoder.

After you do this you can come back to the Intel Manual, 2nd volume, and use it as a reference book.

I also found useful the reverse engineering class from http://opensecuritytraining.info/. This site is created by a Phd student from CMU, most of it is't well done, but it requires longer time to study and apply it.

After you understand the basic ideas you can look over a free project that implements the algorithm. I found useful the distorm project. At the beginning it is important not to look at abstract projects (like qemu or objdump), which try to implement dissasemblers for many languages in the same code as you will get lost. Distorm focuses only on x86 and implements it correctly and exhaustively. It conveys in formal language the definition of X86 language, while the Intel and AMD manuals define X86 language by using natural language.

Other project that works well is udis86 .

1
  • You mean the algorithm for disassembling? It sounds at first like you're calling mov an algorithm. Commented Aug 11, 2016 at 17:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.