I'm reverse-engineering a particular dylib on Mac OS X. The dylib is highly obfuscated, but I suspect it's from some well-known technique that I'm not aware of.
I'd like to describe it in the hope that someone here could identify it:
There are junk instructions inserted everywhere, and every basic block is split by a lot of unconditional jump (but no opaque predicate is used).
At its core, it looks like a obfuscated VM. It behaves like this:
At entry, it push a starting value on the stack, then call
an entry point:
000000010013E070 68 5C 98 42 11 push 1142985Ch
000000010013E075 E8 B5 B4 0E 00 call sub_10022952F
The entry (e.g. sub_10022952F
) isn't a usual function. It will save all registers and point rsi
to an embedded data location that is determined by the push
(i.e. the push 1142985Ch
above), then starts reading rsi
array and jumping accordingly:
000000010013EED7 8B 06 mov eax, [rsi]
000000010013EED9 F5 ;; cmc
000000010013EEDA 45 84 FD ;; test r13b, r15b
000000010013EEDD 48 81 C6 04 00 00 00 add rsi, 4
000000010013EEE4 66 41 81 FA 1C 3B ;; cmp r10w, 3B1Ch
000000010013EEEA 33 C3 xor eax, ebx
000000010013EEEC D1 C0 rol eax, 1
000000010013EEEE E9 FD 8E 15 00 jmp loc_100297DF0
0000000100297DF0 FF C0 inc eax
0000000100297DF2 0F C8 bswap eax
0000000100297DF4 F8 ;; clc
0000000100297DF5 E9 F2 D6 E4 FF jmp loc_1000E54EC
00000001000E54EC C1 C0 03 rol eax, 3
00000001000E54EF 0F C8 bswap eax
00000001000E54F1 53 push rbx
00000001000E54F2 31 04 24 xor [rsp], eax
00000001000E54F5 0F B7 D8 ;; movzx ebx, ax
00000001000E54F8 0F BA F3 82 ;; btr ebx, 82h
00000001000E54FC 5B pop rbx
00000001000E54FD F5 ;; cmc
00000001000E54FE 49 F7 C7 E5 0F 9B 74 ;; test r15, 749B0FE5h
00000001000E5505 F9 ;; stc
00000001000E5506 48 63 C0 movsxd rax, eax
00000001000E5509 48 03 F8 add rdi, rax
00000001000E550C E9 D0 49 08 00 jmp loc_100169EE1
0000000100169EE1 FF E7 jmp rdi
I've commented out junk code. The code will fetch value from [rsi]
into rax
, and do some bit-level-operation with rbx
; then advance rdi <- rdi + rax
; then jump to rdi
.
This structure is everywhere in the dylib, and in a chain connected by jmp rdi
, the bit-level-operations are also the same. But chains from different entry point may have different bit operation for rax
and rbx
.
The VM doesn't have any central structure (like a dispatcher or something), and the jumps go everywhere, not restricted to some location.
To call external function like pthread_mutex_lock
, it will go out of the VM loop first; then call the external function; then go to another entry point. There are many entry points (push xxxx / call xxxx
) to enter VM loop.
I believe this is some well-known technique, because when the dylib is modified, it will prompt message:
File corrupted! This program has been manipulated and maybe it's infected by a Virus or cracked. This file won't work anymore.
It's not a standard Mac OS X message, and Google search for this exact message gives a lot of result, but none of them explains the technique itself.
Some additional information:
It has sections named UPX0/UPX1, but I think they are disguise and the technique has nothing to do with UPX. Because the code only does self-checking, no self-modifying. As a result, it's still obfuscated just as the original file after fully loaded.
I don't know how the it works exactly by now. But I think it's some sort of a VM, because I've traced the invoke of a function of the dylib, I only find the following normal function calls:
pthread_mutex_lock
--> 3 new operator
--> pthread_mutex_unlock
.
And everything else is done inside by those jump rdi
structures mentioned above and switches among many entries points. Therefore the rest of the code logic resides in the VM loop.
jmp rdi
. But the actual difficulty is how the VM codes (pointed byrsi
) behaves, and junk code and unconditional jump isn't essential here. I believe this obfuscation is generated by some well-known specific technique as the "file corrupted" message suggests. So that I don't have to deobfuscate it again by myself.