There are some nice tutorials about malicious Office documents around the web, but as far as I’ve seen so far I dealt an unusual method to hide the shellcode. Great tools like OfficeMalScanner and others are unable to handle this particular scenario, so here is the story of my adventure inside this RTF file.
The first bytes of the file tells me something about the content. It’s a common header for a RTF file document. First of all I tried using RTFScan (which is part of OfficeMalScanner) without luck. The scanner tool is able to recognize an OLE document followed by an object data but it fails to retrieve a possible shellcode from it.
I rapidly decided to put my hands inside the file inspecting the content of the document with an hex editor. The aim of the analysis is to find the shellcode which is executed once the exploit occurs. Looking at the bytes sequence I tried to locate some clues (i.e. a sequence of 0x90 bytes), but I failed miserably. Seems like there’s no trace of particular piece of code resembling a shellcode.
Shellcode should be visible and not encrypted
Taking in mind this concept I started to cut some parts from the RTF file; the idea was to isolate blocks of bytes, in this way it should be easier to recognize the shellcode. I started cutting the header of the file and all the parts containing sequence of strings, they probably don’t represent executable code:
Some bytes below I reached the objdata definition, it contains the exploit type:Converting the sequence “4D53436F6D63746C4C69622E4C697374566965774374” to byte sequence you’ll get the string “MSComctlLib.ListViewCt”. I’m almost sure the exploit takes advantage of an old vulnerability in mscomctl.ocx (cve-2012-0158) to execute arbitrary code but as far as I remember it should be “MSComctlLib.ListViewCtrl.2“, where are the other letters? I need ‘r’, ‘l’, ‘.’ and ‘2’.
The answer is inside the “\bin” definition following the incomplete string. The keyword is used to specify a sequence of bytes in hex format, in this specific case the number of byte is 4. The 0x20 byte follows the keyword and it’s not part of the 4 bytes. So, the defined bytes are: 72, 6c, 2e, 32 and they form the substring I need: “rl.2”. After these bytes and before the next bin definition there are some more bytes, they are not in the hex format and they don’t really represent the string “00000000”. They simply are the bytes sequence 00, 00, 00, 00. This mix of binary and non-binary defined bytes represents the flaw of the entire idea used to hide the shellcode.
The obfuscation
Having in mind this particular behaviour I checked the file from another perspective, and I had a great help from it. In the middle of the file there are a lot of “\bin” definitions. Here is the obfusction in action:
I don’t know how automatic recognition tools are done, but I can now imagine why they are not able to identify suspicious instructions using specific signatures.
The schellcode
The first instruction of the shellcode is inside the above picture, it’s a call to the procedure starting with the next lines of code:
It gets the address of the necessary functions that are used inside the shellcode. They are all from kernel32 dll. It doesn’t need anything else.
The content of the RTF file is entirely loaded inside a dynamic allocated buffer. To better understand this snippet you should try imagining the right scenario: a vulnerable machine runs the RTF, the exploit occurs and the shellcode will be executed. The RTF is already opened, so the file handle already exists and it’s a concrete value. The shellcode author tries to guess the right value of the file handle. That’s why there are some checks inside the snippet, he/she wants to be sure he’s loading the right file.
It’s pretty easy to understand this part of the code directly from the dead list, but if you want to proceed reversing the shellcode a debugger is almost necessary. Why? Well, the decrypted malicious file contains a snippet that is called directly from the shellcode.
To debug the shellcode like in a real environment I wrote a little piece of code:
char shellcode[] = "\xE8\x8B\x00...\x04\x89\xF2\xC3";
int main(int argc, char **argv)
{
HANDLE hFile;
// Open the RTF file, necessary to simulate a real scenario
hFile = CreateFile("rtf.zai",GENERIC_READ,0,NULL,OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL,NULL);
// Jump to the shellcode code
__asm {
mov eax, offset shellcode;
push eax;
ret;
}
CloseHandle(hFile);
}
Now I can debug everything like in a real environment.
Back to the analysis, the RTF file in memory is decrypted using an algorithm (it’s not a single xor operation but it’s not interesting per se). There’s not much to say about the last part of the shellcode, the most interesting thing is the fact that part of the shellcode is inside the decrypted RTF file. It’s really hard to get the entire shellcode from the malicious document; you may can use a static tool but you sure have to decrypt the file.
To sum up:
1: exploit triggers
2: shellcode starts
3: RTF file loaded inside dynamic allocated memory
4: decryption of the RTF file in memory
5: shellcode continues its execution from the decrypted RTF
6: machine infection
Machine infection
Here is the last task of the malicious document, the most important for a malware author, the infection. Two files are created, the first one is the malware and the other one is just a clean document. Both of them are created inside the temp directory, the malware has a random temporary file name and the document has a fixed name “cv.doc”.
Looking at the list of functions obtained from kernel32.dll by the shellcode you can predict the sequence of functions used to create the two files (GetTempPathA – GetTempFileNameA/lstrcat – CreateFileA – WriteFile – CloseHandle and WinExec).
Once created, the malware is immediately started using WinExec. The same function is used to show the content of the cv.doc file calling winword as a reader. The doc file is a clean version of the malicious RTF file, it does contain the OLE part but the “\object\objocx” section is not inside the file anymore (the exploit/shellcode part has been cutted off).
That’s the content of the fake file, the one used to show something on the screen. According to Google translation it should be the word “Instructions” but I don’t care much about the meaning of it.
That’s all for now, I’ll blog about the malware analysis in a future post, stay tuned!