How does decompiling work?

Question

I have heard the term "decompiling" used a few times before, and I am starting to get very curious about how it works.

I have a very general idea of how it works; reverse engineering an application to see what functions it uses, but I don't know much beyond that.

I have also heard the term "disassembler", what is the difference between a disassembler and a decompiler?

So to sum up my question(s): What exactly is involved in the process of decompiling something? How is it usually done? How complicated/easy of a processes is it? can it produce the exact code? And what is the difference between a decompiler, and a disassembler?

possible duplicate of What is a de-compiler how does it work? — Greg Bacon, Commented Apr 25, 2012 at 19:42

SlowLearner · Accepted Answer · 2018-05-07 08:38:11Z

Ilfak Guilfanov, the author of Hex-Rays Decompiler, gave a speech about the internal working of his decompiler at some con, and here is the white paper and a presentation. This describes a nice overview in what are all the difficulties in building a decompiler and how to make it all work.

Apart from that, there are some quite old papers, e.g. the classical PhD thesis of Cristina Cifuentes.

As for the complexity, all the "decompiling" stuff depends on the language and runtime of the binary. For example decompiling .NET and Java is considered "done", as there are available free decompilers, that have a very high succeed ratio (they produce the original source). But that is caused by the very specific nature of the virtual machines that these runtimes use.

As for truly compiled languages, like C, C++, Obj-C, Delphi, Pascal, ... the task get much more complicated. Read the above papers for details.

what is the difference between a disassembler and a decompiler?

When you have a binary program (executable, DLL library, ...), it consists of processor instructions. The language of these instructions is called assembly (or assembler). In a binary, these instructions are binary encoded, so that the processor can directly execute them. A disassembler takes this binary code and translates it into a text representation. This translation is usually 1-to-1, meaning one instruction is shown as one line of text. This task is complex, but straightforward, the program just needs to know all the different instructions and how they are represented in a binary.

On the other hand, a decompiler does a much harder task. It takes either the binary code or the disassembler output (which is basically the same, because it's 1-to-1) and produces high-level code. Let me show you an example. Say we have this C function:

int twotimes(int a) {
    return a * 2;
}

When you compile it, the compiler first generates an assembly file for that function, it might look something like this:

_twotimes:
    SHL EAX, 1
    RET

(the first line is just a label and not a real instruction, SHL does a shift-left operation, which does a quick multiply by two, RET means that the function is done). In the result binary, it looks like this:

08 6A CF 45 37 1A

(I made that up, not real binary instructions). Now you know, that a disassembler takes you from the binary form to the assembly form. A decompiler takes you from the assembly form to the C code (or some other higher-level language).

The links on this answer are dead, does anyone have up to date references? — ProdigySim, Commented Apr 25, 2020 at 19:52
@ProdigySim 1) web.archive.org/web/20200410182642/https://www.hex-rays.com/… 2) web.archive.org/web/20181014083034/https://www.hex-rays.com/… 3) web.archive.org/web/20181222035256/https://www.hex-rays.com/… 4) web.archive.org/web/20130407233420/http://itee.uq.edu.au/… — Rain, Commented Jun 14, 2021 at 2:46

Nick · Accepted Answer · 2012-04-25 07:32:07Z

6

Decompiling is essentially the reverse of compiling. That is - taking the object code (binary) and trying to recreate the source code from it.

Decompilation depends on artefacts being left in the object code which can be used to ascertain the structure of the source code.

With C/C++ there isn't much left to help the decompilation process so it's very difficult. However with Java and C# and other languages which target virtual machines, it can be easier to decompile because the language leaves many more hints within the object code.

answered Apr 25, 2012 at 7:32

Nick

25.5k7 gold badges52 silver badges86 bronze badges

2

Everyone is saying that it's "difficult" - but is it even always possible?
– Marco Prins
Commented Mar 16, 2017 at 13:46
4

@MarcoPrins: Hexrays says that in general no, it is not automatically always possible. Assumptions about the compilation guidelines have to be made (such as a "usual" popular compiler has been used and not some odd special non-standardized hack implementation or "evil" hand-crafted assembly).
– BullyWiiPlaza
Commented Dec 14, 2017 at 0:55
7

"Decompiling is essentially the reverse of compiling" - what a marvelous insight.
– theMayer
Commented Apr 6, 2018 at 16:55

Add a comment |

Collectives™ on Stack Overflow

How does decompiling work?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
decompiling
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged decompiling or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
decompiling
or ask your own question.