Can I blog an incomplete solution or an incomplete analysis? Why not! That’s the spirit of this blog entry!
More than one year ago I started a project with Kayaker, we decided to write a tool able to show hidden callbacks. If I remember correctly the idea was born while we were putting our hands on a rootkit. In the same days I bet there were many reversers around thinking the same thing because the same tool was developed by others. As you can imagine our tool never see the light, but not because there are similar tools available online; mostly because we are two old lazy reversers!
I bet you are thinking: why the hell are you writing this stupid intro? Well, the tools I mentioned before were bugged and some months ago I discovered the same thing, they are still bugged (I don’t know if they have solved their problems right now…). Strange that no one else noticed it yet.
Anyway, we won’t complete the tool, but with this blog post I would like to tell you some notes about our investigations. At the beginning I wanted to write a detailed and complete article about the subject, but I don’t know when I’ll be able to end this project so I decided to spread out some of my notes.
It’s a sort of two minds work so credit goes to Kayaker too!
The idea is to try to retrieve hidden callbacks that has been installed via CmRegisterCallback, PsSetCreateProcessNotifyRoutine, PsSetCreateThreadNotifyRoutine and PsSetLoadImageNotifyRoutine. After that it would be good to deregister one or more of them.
Where to start?
First of all you have to understand what’s behind functions like CmRegisterCallback, and others. Then, you’ll have something to work on. I’ll start with CmRegisterCallback (from XP SP2), the function is used to register a RegistryCallback routine, and I think the XP version is the most simple one to fully undestand the principles behind the function. There are some differencies between XP and 7 versions, but I think you’ll be able to fully understand 7 structure too! Here is the disassembled function (without useless parts of course):
487E6B push 'bcMC' ; Pool Tag: "CMcb" 487E70 xor ebx, ebx 487E72 push 38h ; NumberOfBytes: 0x38 487E74 inc ebx 487E75 push ebx ; PoolType: PAGEDPOOL 487E76 call ExAllocatePoolWithTag ; ExAllocatePoolWithTag(x,x,x): allocates pool memory 487E7B mov esi, eax ; eax is the pointer to the allocated pool memory, PCM_CALLBACK_CONTEXT_BLOCK 487E7D xor edi, edi 487E7F cmp esi, edi ; Is PCM_CALLBACK_CONTEXT_BLOCK a NULL pointer? 487E81 jz cmRegisterCallback_fails ; yes: function fails... 487E87 push esi 487E88 push [ebp+Function] ; PEX_CALLBACK_FUNCTION, pointer to callback function 487E8B call _ExAllocateCallBack ; allocates and fill EX_CALLBACK_ROUTINE_BLOCK structure (more on this later...) 487E90 cmp eax, edi ; ExAllocateCallback success or not? 487E92 mov [ebp+PEX_CALLBACK_ROUTINE_BLOCK], eax ; store the pointer to the allocated pool memory 487E95 jnz short _ExAllocateCallBack_success ... ; fill CM_CALLBACK_CONTEXT_BLOCK fields 487EDC mov ebx, offset CmpCallBackVector 487EE1 mov [ebp+i], edi ; i = 0 487EE4 try_next_slot: 487EE4 push edi ; OldBlock: NULL 487EE5 push [ebp+PEX_CALLBACK_ROUTINE_BLOCK] ; NewBlock with information to add 487EE8 push ebx ; CmpCallbackVector[i] 487EE9 call _ExCompareExchangeCallBack ; try to *insert* the new callback inside CmpCallBack vector 487EEE test al, al ;check the result... 487EF0 jnz short free_slot_has_been_found ; jump if the vector has an empty space for the new entry 487EF2 add [ebp+i], 4 ; i++, increase the counter 487EF6 add ebx, 4 ; shift to the next item of the vector to check 487EF9 cmp [ebp+i], 190h ; is the end of the vector? 487F00 jb short try_next_slot ; no: try another one. yes: no free slot! ... 487F11 cmRegisterCallback_fails: 487F11 mov eax, STATUS_INSUFFICIENT_RESOURCES 487F16 end_CmRegisterCallback: ... 487F1A retn 0Ch ... 487F1D free_slot_has_been_found: 487F1D mov eax, 1 487F22 mov ecx, offset _CmpCallBackCount ; CmpCallBackCount: number of not NULL item inside the vector 487F27 xadd [ecx], eax ; there's a new callback, it increases the number of item inside the vector 487F2A xor eax, eax 487F2C jmp short end_CmRegisterCallback
As you can see the idea behind the function is really simple!
Basically, it tries to add a new entry inside a vector named CmpCallBackVector, and when the entry is correctly inserted the registration process will end with a success.
How do I know is it using a vector? The add instruction at 0x487EF6 represents a clear clue, and the cmp at 0x487EF9 reveals the fixed length of the vector (the vector has 100 items (0x190/4…)). Now that I have this information I’m going to try to explain the entire procedure in detail. The algorithm could be divided into 5 big blocks:
1: try to allocate 0x38 bytes for a structure named CM_CALLBACK_CONTEXT_BLOCK
2: try to allocate 0x0C bytes for a structure named EX_CALLBACK_ROUTINE_BLOCK
3: fill CM_CALLBACK_CONTEXT_BLOCK fields
4: look for an empty slot, insert a sort of PEX_CALLBACK_ROUTINE_BLOCK in it and update CmpCallBackCount
5: notify success or error and exit
Point #1 is pretty simple to understand, it’s only a call to ExAllocatePoolWithTag.
To understand point #2 you have to see what’s going on behind ExAllocateCallBack procedure. Let’s start taking a look at it:
52AB35 push 'brbC' ; Pool Tag: Cbrb 52AB3A push 0Ch ; NumberOfBytes: 0x0C 52AB3C push 1 ; PoolType: PAGED_POOL 52AB3E call ExAllocatePoolWithTag ; alloc a EX_CALLBACK_ROUTINE_BLOCK structure 52AB43 test eax, eax ; ExAllocatePoolWithTag success or not? 52AB45 jz short _ExAllocateCallBack_fails 52AB47 mov ecx, [ebp+_pex_callback_function] ; pointer to callback function (PEX_CALLBACK_FUNCTION) 52AB4A and dword ptr [eax], 0 ; 1° field: 0 52AB4D mov [eax+4], ecx ; 2° field: _pex_callback_function 52AB50 mov ecx, [ebp+_pool_allocated_memory] ; PCM_CALLBACK_CONTEXT_BLOCK 52AB53 mov [eax+8], ecx ; 3° field: _pcm_callback_context_block 52AB56 _ExAllocateCallBack_fails: ...
The procedure is used to allocate and fill a special structure:
typedef struct _EX_CALLBACK_ROUTINE_BLOCK
{
EX_RUNDOWN_REF RundownProtect;
PEX_CALLBACK_FUNCTION Function;
PCM_CALLBACK_CONTEXT_BLOCK Context;
} EX_CALLBACK_ROUTINE_BLOCK, *PEX_CALLBACK_ROUTINE_BLOCK;
As you can see from the lines above the first field has been setted to 0 while the other fields are filled with two pointers: the function to register and the context containing info about the callback.
While point #3 is just a series of mov instructions used to fill CM_CALLBACK_ROUTINE_BLOCK structure, point #4 gives some usefull information to us: CmpCallBackVector has 100 elements and this part of code is used to scan the entire vector until an empty element is found. A failure leads us to a non-registration of the callback. What happens when there’s a empty slot inside the vector? The new entry will be added inside the vector. Most of the job is done by the function named ExCompareExchangeCallBack, here is the core of the function:
52AB81 mov eax, [ebp+CmpCallbackVector] ; vector at the current position 52AB84 mov ebx, [eax] ; ebx is a PEX_CALLBACK_ROUTINE_BLOCK, the item could be NULL or not 52AB86 mov eax, ebx 52AB88 xor eax, [ebp+OldBlock] ; OldBlock is NULL for a registration process 52AB8B mov [ebp+current_pex_callback_routine_block], ebx 52AB8E cmp eax, 7 ; check used to see if the current item is NULL or not 52AB91 ja short loc_52ABB5 ; jump if not NULL 52AB93 test esi, esi ; is NewBlock NULL? 52AB95 jz short loc_52ABA1 ; jump if it's NULL 52AB97 mov eax, esi ; esi, NewBlock pointer (changed...) 52AB99 or eax, 7 ; PAY ATTENTION HERE: or 7 !?! 52AB9C mov [ebp+NewBlock], eax ; change NewBlock pointer: NewBlock = NewBlock OR 7 52AB9F jmp short loc_52ABA5 ... 52ABA5 mov eax, [ebp+var_4] ; here if CmpCallbackVector's item is null 52ABA8 mov ecx, [ebp+CmpCallbackVector] ; current empty slot 52ABAB mov edx, [ebp+NewBlock] ; new pointer to insert 52ABAE cmpxchg [ecx], edx ; insert the new pointer inside the empty slot! 52ABB1 cmp eax, ebx 52ABB3 jnz short loc_52AB81 52ABB5 and ebx, not 7 ; PAY ATTENTION HERE! 52ABB8 cmp ebx, [ebp+OldBlock] ; here if CmpCallbackVector's item is not null 52ABBB jnz short loc_52AC19 52ABBD test ebx, ebx 52ABBF jz short loc_52AC15
The routine contains some more things inside, but we can stop here with the analysis because we have everything we need. If the pointer to the NewBlock to insert is not NULL and there’s an available empty slot the pointer is inserted inside the vector; after that CmpCallBackCount value will be updated (remember the snippet at the beginning of this blog entry?).
The last part of the algorithm (point #5) is a simple return with a success or insuccess value:
52AC15 mov al, 1 ; 1 means success, new item has been added to CmpCallbackVector 52AC17 jmp short loc_52AC29 52AC19 test esi, esi ; esi -> NewBlock 52AC1B jz short loc_52AC27 52AC1D push 8 52AC1F pop edx 52AC20 mov ecx, esi 52AC22 call ExReleaseRundownProtectionEx ; if esi is not null something went wrong... 52AC27 xor al, al ; 0 means insuccess, new item has not been added to CmpCallbackVector
Ok, I think we have a general idea about the vector; each entry contains a *sort* of pointer to a EX_CALLBACK_ROUTINE_BLOCK, and to reveal all of them you only have to scan the entire vector!
To sum up, I have 3 possible scenes:
1. CmpCallbackVector’s item is empty:
the new block will be inserted inside the vector. The added value is not the one passed to ExCompareExchangeCallBack, but it’s the value modified by a “OR 7” logic operation.
2. CmpCallbackVector’s item is full:
it simply returns STATUS_INSUCCESS and it will try with the next item of the vector
3. Someone is working on the CmpCallbackVector’s item:
the registration process reveals an interesting behaviour, just to be sure to be the only one accessing the resource the system uses a lock mechanism. The OR and AND operations are the core of that mechanism (0x52AB99 and 0x52ABB5, commented using “PAY ATTENTION HERE!”). If the current item of the vector is not NULL the compare instruction at 0x52AB8E fails and the code flow continues from 0x52ABB5. At this point the real address of the item is extracted (stored_value AND NOT 7) and compared with NULL; it’s obviously not NULL and as you can see around 0x52AC22 the resource is released because someone else is working on it. Now you should understand why the hell the system uses to OR by 7 the value to add inside the vector.
With all this kind of information I can finally write a routine able to read all the stored callbacks:
cells = 0x64; // cells inside CmpCallbackVector nMod = *(DWORD*)_sysmodBuffer; // _sysmodBuffer filled by "ZwQuerySystemInformation(SystemModuleInformation..." for(i=0;i<cells;i++) { // take current item from CmpCallbackVector (look at the "& ~7" operation) pCBRB = (PEX_CALLBACK_ROUTINE_BLOCK)((*(DWORD*)(_CmpCallbackVectorAddress + 4*i )) & ~7); if (pCBRB != 0) { sysmodTmp = (PSYSTEM_MODULE_INFORMATION)((DWORD)_sysmodBuffer + 4); j = 0; while (jFunction) Base + (DWORD)sysmodTmp->Size) && ((DWORD)pCBRB->Function) > ((DWORD)sysmodTmp->Base)) { // Callback has been found DbgPrint("Result: %LX: %s\r\n", pCBRB->Function, sysmodTmp->ImageName); break; } // get the next module sysmodTmp = (PSYSTEM_MODULE_INFORMATION)((DWORD)sysmodTmp + sizeof(SYSTEM_MODULE_INFORMATION)); j = j + 1; } }
It’s important to scan all the cells inside the vector! One of the tool available on the web fails to retrieve callbacks stored after an empty element of the vector.
Well, the only thing to reveal about the code above is CmpCallbackVectorAddress, the address of CmpCallBackVector. How can I locate the exact address of CmpCallBackVector? Imho, that’s the hardest part of the entire process!
How to find CmpCallbackVector address
To develop a tool for a specific OS is pretty easy because the vector’s address is hardcoded; it would be nice to discover an OS independent technique.
I think the most used approach is a byte-search based on a specific sequence of bytes; it’s a nice idea but I don’t want to list every OS version known to man inside my source code. We (I and kayaker) spent a lot of time over this point, we both wanted to develop something that is not totally related to a specific OS version; something that doesn’t require a series of “if OS == xxx” statements inside the code. It’s quite impossible to write a non OS dependent code but I believe it’s possible to remove some OS checks from the code.
We finally came up with two ideas, a practical and a theoretical idea. I hate theory and mine is the practical solution of course. I think both ideas are valid and just to be sure to find the right vector’s address we decided to combine them inside a hypothetical tool, four eyes are always better than two!
The practical approach
My idea is really simple, since of the vector’s address is hardcoded you’ll surely have it in two different parts of the code:
PAGE:005392D0 BB 20 05 48 00 mov ebx, offset _CmpCallBackVector .data:00480520 _CmpCallBackVector db 0
The address is inside two sections, PAGE and data. An *xref-search* is the core of the idea! It’s pretty stupid indeed, but from what I’ve seen so far it works!
The pseudo code of my xref search is explained here, basically it scans the entire PAGE section trying to locate the right address:
callbackAddress = CmUnregisterCallback address in memory pagePointer = pointer_to_PAGE_section while (pagePointer < pointer_to_PAGE_section + size_of_PAGE_section) { value = get dword pointed by pagePointer if (value is inside DATA section) if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range)) { CmpCallbackVector = value exit! } pagePointer++ }
As you can imagine a simple xref-search is unable to find out the right value, you need one more check. That’s why I added the line:
if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range))
where callbackAddress is the address of CmUnregisterCallback. What does it mean? Well, ‘pagePointer’ should be inside the first “range” bytes of CmUnregisterCallback function. If both “if” statements are satisfied I’m pretty sure about the vector’s address value.
There are still 2 points to clarify:
– what's range variable?
– why CmUnregisterCallback?
range is just a numerical value and you'll only have to decide a value to assign to it. Under XP the first bytes of the CmUnregisterCallback function are:
PAGE:005392C3 8B FF mov edi, edi PAGE:005392C5 55 push ebp PAGE:005392C6 8B EC mov ebp, esp PAGE:005392C8 51 push ecx PAGE:005392C9 83 65 FC 00 and [ebp+var_4], 0 PAGE:005392CD 53 push ebx PAGE:005392CE 56 push esi PAGE:005392CF 57 push edi PAGE:005392D0 BB 20 05 48 00 mov ebx, offset _CmpCallBackVector
In this specific case 16 could be a possible value… What about the other OSs? Well, as I said before I think it's hard to write a universal piece of code, but as far as I have seen it's possible to adjust the "range" to cover some more OSs. I don't have Vista and 7 running on my system and I'm working on the dead list only, but I think 148 could be a nice value to set and it should cover all the OSs. If you are still reading and you have Vista or 7, can you confirm that?
One more thing about the search pattern: I use CmUnregisterCallback because (inspecting all the OSs) CmRegisterCallback doesn't always store the CmpCallbackVector value inside the main routine, but it hides it under some calls. i.e. look at CmRegisterCallback from 7:
PAGE:0065712A mov edi, edi PAGE:0065712C push ebp PAGE:0065712D mov ebp, esp PAGE:0065712F push [ebp+Cookie] PAGE:00657132 mov eax, offset stru_4FFDF0 PAGE:00657137 push 1 PAGE:00657139 push [ebp+Context] PAGE:0065713C push [ebp+Function] PAGE:0065713F call sub_657153 ; It's everything inside this call!!! PAGE:00657144 pop ebp PAGE:00657145 retn 0Ch
It’s much more complex to attack a procedure with sub-routines, don't you think? That's why I did opt for CmUnregisterCallback.
What about the PsSet* functions?
At the beginning of this blog post I mentioned some more functions, it's time to spend some words for them too.
The functions are:
PsSetCreateProcessNotifyRoutine
PsSetCreateThreadNotifyRoutine
PsSetLoadImageNotifyRoutine
There are some similarities between CmRegisterCallback and the new three functions: they all register something, they all use a vector to store the information, and they all use the same function! YES, to register a function they use the same scheme:
1. get the address of a specific vector
2. try to insert the new item inside the vector calling ExCompareExchangeCallBack
Just to clarify everything look at this snippet, taken from PsSetCreateThreadNotifyRoutine:
4ED7C4 mov esi, offset _threadVector ; the vector 4ED7C9 push 0 4ED7CB push ebx 4ED7CC push esi 4ED7CD call _ExCompareExchangeCallBack ; the function 4ED7D2 test al, al 4ED7D4 jnz short loc_4ED7F3 4ED7D6 add edi, 4 4ED7D9 add esi, 4 4ED7DC cmp edi, 20h ; the check over the number of items inside the vector 4ED7DF jb short loc_4ED7C9
The only different thing is the length of the vector:
_callbackVector: 0x64 slots
_processVector: 0x8 slots
_threadVector: 0x8 slots
_imageVector: 0x8 slots
Well, you can use all the info I gave you about CmRegisterCallback for these three functions too! I think you'll be able to retrieve all the hidden callbacks, and -just in case- unregister a callback. There are so many ways from the dirty one (put NULL inside the vector's slot) to the right one (calling the right unregister function)… you only have to decide!