4

I'm reverse-engineering a particular dylib on Mac OS X. The dylib is highly obfuscated, but I suspect it's from some well-known technique that I'm not aware of.

I'd like to describe it in the hope that someone here could identify it:

  1. There are junk instructions inserted everywhere, and every basic block is split by a lot of unconditional jump (but no opaque predicate is used).

  2. At its core, it looks like a obfuscated VM. It behaves like this:

At entry, it push a starting value on the stack, then call an entry point:

000000010013E070 68 5C 98 42 11                          push    1142985Ch
000000010013E075 E8 B5 B4 0E 00                          call    sub_10022952F

The entry (e.g. sub_10022952F) isn't a usual function. It will save all registers and point rsi to an embedded data location that is determined by the push (i.e. the push 1142985Ch above), then starts reading rsi array and jumping accordingly:

000000010013EED7 8B 06                                   mov     eax, [rsi]
000000010013EED9 F5                                      ;; cmc
000000010013EEDA 45 84 FD                                ;; test    r13b, r15b
000000010013EEDD 48 81 C6 04 00 00 00                    add     rsi, 4
000000010013EEE4 66 41 81 FA 1C 3B                       ;; cmp     r10w, 3B1Ch
000000010013EEEA 33 C3                                   xor     eax, ebx
000000010013EEEC D1 C0                                   rol     eax, 1
000000010013EEEE E9 FD 8E 15 00                          jmp     loc_100297DF0

0000000100297DF0 FF C0                                   inc     eax
0000000100297DF2 0F C8                                   bswap   eax
0000000100297DF4 F8                                      ;; clc
0000000100297DF5 E9 F2 D6 E4 FF                          jmp     loc_1000E54EC

00000001000E54EC C1 C0 03                                rol     eax, 3
00000001000E54EF 0F C8                                   bswap   eax
00000001000E54F1 53                                      push    rbx
00000001000E54F2 31 04 24                                xor     [rsp], eax
00000001000E54F5 0F B7 D8                                ;; movzx   ebx, ax
00000001000E54F8 0F BA F3 82                             ;; btr     ebx, 82h
00000001000E54FC 5B                                      pop     rbx
00000001000E54FD F5                                      ;; cmc
00000001000E54FE 49 F7 C7 E5 0F 9B 74                    ;; test    r15, 749B0FE5h
00000001000E5505 F9                                      ;; stc
00000001000E5506 48 63 C0                                movsxd  rax, eax
00000001000E5509 48 03 F8                                add     rdi, rax
00000001000E550C E9 D0 49 08 00                          jmp     loc_100169EE1

0000000100169EE1 FF E7                                   jmp     rdi

I've commented out junk code. The code will fetch value from [rsi] into rax, and do some bit-level-operation with rbx; then advance rdi <- rdi + rax; then jump to rdi.

This structure is everywhere in the dylib, and in a chain connected by jmp rdi, the bit-level-operations are also the same. But chains from different entry point may have different bit operation for rax and rbx.

The VM doesn't have any central structure (like a dispatcher or something), and the jumps go everywhere, not restricted to some location.

To call external function like pthread_mutex_lock, it will go out of the VM loop first; then call the external function; then go to another entry point. There are many entry points (push xxxx / call xxxx) to enter VM loop.

I believe this is some well-known technique, because when the dylib is modified, it will prompt message:

File corrupted! This program has been manipulated and maybe it's infected by a Virus or cracked. This file won't work anymore.

It's not a standard Mac OS X message, and Google search for this exact message gives a lot of result, but none of them explains the technique itself.

Some additional information:

  • It has sections named UPX0/UPX1, but I think they are disguise and the technique has nothing to do with UPX. Because the code only does self-checking, no self-modifying. As a result, it's still obfuscated just as the original file after fully loaded.

  • I don't know how the it works exactly by now. But I think it's some sort of a VM, because I've traced the invoke of a function of the dylib, I only find the following normal function calls:

pthread_mutex_lock --> 3 new operator --> pthread_mutex_unlock.

And everything else is done inside by those jump rdi structures mentioned above and switches among many entries points. Therefore the rest of the code logic resides in the VM loop.

1
  • @perror I'm aware of "control-flow flattening", it's the technique for jmp rdi. But the actual difficulty is how the VM codes (pointed by rsi) behaves, and junk code and unconditional jump isn't essential here. I believe this obfuscation is generated by some well-known specific technique as the "file corrupted" message suggests. So that I don't have to deobfuscate it again by myself.
    – user27283
    Commented Jan 29, 2019 at 16:54

2 Answers 2

3

I could be wrong, but it looks like VMProtect v3. For this version, this obfuscator inlines all handlers, so that's normal you don't find any dispatcher.

1
  • 1
    +1 -- looks like VMProtect to me also, though I'm not sure about the version. Commented Feb 2, 2019 at 21:52
2

Yes, this is VMProtect v3 virtualization based protector, which works by disassembling the x86 byte code of the target executable and compiling it into a proprietary, polymorphic byte code which is executed in a custom interpreter at run-time.

VMProtect is a stack machine. Each handler though consisting of scant few instructions performs several tasks, e.g. popping several values, performing multiple operations, pushing one or more values.

here is some analysis of its dispatcher:

push edi; push all registers
push ecx
push edx
push esi
push ebp
push ebx
push eax
push edx
pushf
push 0 ; imagebase fixup
mov esi, [esp+8+arg_0] ; esi = pointer to VM bytecode
mov ebp, esp ; ebp = VM's "stack" pointer
sub esp, 0C0h
mov edi, esp ; edi = "scratch" data area

VM__FOLLOW__Update:
add esi, [ebp+0]

VM__FOLLOW__Regular:
mov al, [esi]; read a byte from EIP
movzx eax, al
sub esi, -1; increment EIP
jmp ds:VM__HandlerTable[eax*4] ; execute instruction handler

Here is a disassembly of some instruction handlers:

#00:x = [EIP-1] & 0x3C; y = popd; [edi+x] = y

.text:00427251 and al, 3Ch; al = instruction number
.text:00427254 mov edx, [ebp+0] ; grab a dword off the stack
.text:00427257 add ebp, 4 ; pop the stack
.text:0042725A mov [edi+eax], edx ; store the dword in the scratch space

#01:x = [EIP-1] & 0x3C; y = [edi+x]; pushd y

.vmp0:0046B0EB and al, 3Ch; al = instruction number
.vmp0:0046B0EE mov edx, [edi+eax] ; grab a dword out of the scratch space
.vmp0:0046B0F1 sub ebp, 4 ; subtract 4 from the stack pointer
.vmp0:0046B0F4 mov [ebp+0], edx ; push the dword onto the stack

#02:x = popw, y = popw, z = x + y, pushw z, pushf

.text:004271FB mov ax, [ebp+0] ; pop a word off the stack
.text:004271FF sub ebp, 2
.text:00427202 add [ebp+4], ax ; add it to another word on the stack
.text:00427206 pushf
.text:00427207 pop dword ptr [ebp+0] ; push the flags

#03:x = [EIP++]; w = popw; [edi+x] = Byte(w)

.vmp0:0046B02A movzx eax, byte ptr [esi] ; read a byte from EIP
.vmp0:0046B02D mov dx, [ebp+0] ; pop a word off the stack
.vmp0:0046B031 inc esi ; EIP++
.vmp0:0046B032 add ebp, 2; adjust stack pointer
.vmp0:0046B035 mov [edi+eax], dl ; write a byte into the scratch area

#04:x = popd, y = popw, z = x << y, pushd z, pushf

.vmp0:0046B095 mov eax, [ebp+0]; pop a dword off the stack
.vmp0:0046B098 mov cl, [ebp+4] ; pop a word off the stack
.vmp0:0046B09B sub ebp, 2
.vmp0:0046B09E shr eax, cl ; shr the dword by the word
.vmp0:0046B0A0 mov [ebp+4], eax; push the result
.vmp0:0046B0A3 pushf
.vmp0:0046B0A4 pop dword ptr [ebp+0] ; push the flags

#05:x = popd, pushd ss:[x]

.vmp0:0046B5F7 mov eax, [ebp+0]; pop a dword off the stack
.vmp0:0046B5FA mov eax, ss:[eax] ; read a dword from ss
.vmp0:0046B5FD mov [ebp+0], eax; push that dword

Not the answer you're looking for? Browse other questions tagged or ask your own question.