SlideShare a Scribd company logo
Finding Xori
Malware Analysis Triage with Automated Disassembly
Amanda Rousseau
Rich Seymour
About Us
Amanda Rousseau Rich Seymour
Sr. Malware Researcher,
Endgame, Inc.
@malwareunicorn
Sr. Data Scientist,
Endgame, Inc.
@rseymour
Quick Overview
The Current State of
Disassemblers
Functionality & Features
Brief overview of pros and cons with current popular
open source PE disassemblers.
Overview how we pulled together the different
aspects of disassemblers and emulator
Usage & Demo
How the output is used for automation. Applying the
tool on various malware samples and shellcode.
The Problem
There are millions of malware samples to look at and a few
reverse engineers.
We need to change the way we are going about this if we are
going to keep up.
How to leverage large scale disassembly in an automated way
with many samples?
● Improve the scalability in malware analysis
● Integration and automation
Present Day Common Disassemblers
Capstone Radare2 IDA Pro Hopper Binary
Ninja
Size small small large medium large
Stability ✔ ✖ ✔ ✔ ✔
Price - - $$$ $ $$
Cross
Platform
✔ ~ ✔ ✖ ✔
Usability ~ ~ ✔ ~ ~
Accuracy ~ ~ ✔ ~ ~
Integration ✔ ~ ✖ ✖ ✖
Requirements
● Fast development
● Stability and resilience
● Cross platform
● Output can be easily integrated
● Ease of use
● Core feature set
● Output accuracy
Evaluating Disassemblers
The first step - Diving into the code:
● Verifying the accuracy of various disassemblers
● Understand each of their strengths and limitations
We adopted different aspects of disassemblers and emulator modules.
● Much of Capstone is also based on the LLVM & GDB repositories
● QEMU is the emulation is straightforward, easy to understand
● Converted some of the logic into Rust, while also fixing a few bugs along the way.
Evaluating Example
x66x90
XCHG AX, AX [Objdump]✔
X86 32bit:
NOP [Capstone]✖
NOP [Distorm]✖
XCHG AX, AX [IDA Pro]✔
OpSize Opcode
Developed in Rust
Why Rust?
● Same capabilities in CC++
● Stack protection
● Proper memory handling (guaranteed memory safety)
● Provides stability and speed (minimal runtime)
● Faster development
● Helpful compiler
Current Features
● Open source
● Supports i386 and x86-64 architecture only at
the moment
● Displays strings based on referenced memory
locations
● Manages memory
● Outputs Json
● 2 modes: with or without emulation
○ Light Emulation - meant to enumerate all
paths (Registers, Stack, Some
Instructions)
○ Full Emulation - only follows the code’s
path (Slow performance)
● Simulated TEB & PEB structures
● Evaluates functions based on DLL exports
Design Memory Manager
Image
TEB
PEB
DLL headers
Analysis
Functions
Disasm
Imports
PE Loader
State
CPU Registers & Flags
Stack
Loop Tracking
Analysis
This structure contains the CPU state of the registers &
flags, a new copy of the stack, and short circuiting for looping
during emulation.
State
Handles the loading of the PE image into memory and sets
up the TEB/PEB as well as initializing the offsets to loaded
DLLs and import table.
PE Loader
This structure contains all of the mmap memory for the
Image, TEB/PEB, and DLL headers. Accessors for Read &
Write to avoid errors in inaccessible memory.
Memory Manager
The core container for the disassembly, functions, and
imports.
Roll your own PE Parser
● Although a few Rust PE parsers exist: goblin, pe-rs we
decided to create our own.
● Chose to write it using the nom parser combinator
framework
● Ideally less error prone due to safe macro constructions
● Many lessons learned
● From a historical perspective a PE parser start reading a
16 bit DOS file
● Then optionally switches to a PE32 or a PE32+
● This is like a history of DOS and Microsoft Windows in a
single parser.
Analysis Enrichment
● The header is used to build the memory sections of the
PE Image
● Similar to the PE loader in windows, it will load the image
similar to how it would be loaded in the addressable
memory. Where the imports are given memory address,
rewritten in the image.
Image
.text
.data
.idata
.rsrc
Stack
DLLs
TEB
PEB
Symbols
● We needed a way to load DLL exports and header
information without doing it natively.
● Built a parser that would generate json files for
consumption called pesymbols.
● Instead of relying on the Import Table of the PE, it
generates virtual addresses of the DLL and API in the
Image’s Import Table. This way you can track the actual
address of the function being pushed into various registers.
● The virtual address start is configurable as well as the json
location.
{
"name": "kernel32.dll",
"exports": [
{
"address": 696814,
"name": "AcquireSRWLockExclusive",
"ordinal": 1,
"forwarder": true,
"forwarder_name": "NTDLL.RtlAcquireSRWLockExclusive"
},
{
"address": 696847,
"name": "AcquireSRWLockShared",
"ordinal": 2,
"forwarder": true,
"forwarder_name": "NTDLL.RtlAcquireSRWLockShared"
},
...
"dll_address32": 1691680768, 0x64D50000
"dll_address64": 8789194768384, 0x7FE64D50000
"function_symbol32":
"./src/analysis/symbols/generated_user_syswow64.json",
"function_symbol64":
"./src/analysis/symbols/generated_user_system32.json",
...
Configurable in xori.json
Example
generated_user_syswow64.json
Dealing with Dynamic API Calls
The Stack
The TEB and PEB structures are simulated based on the the
imports and known dlls in a windows 7 environment.
TEB/PEB
Segregated memory for the local memory storage
such as the stack.
Memory Management
If references to functions are pushed into a register
or stack will be able to be tracked.
Dealing with Dynamic API Calls
0x4010ed A3 00 10 40 00 mov [0x401000], eax
0x4010f2 68 41 10 40 00 push 0x401041 ; LoadLibraryA
0x4010f7 FF 35 00 10 40 00 push [0x401000]
0x4010fd E8 C9 01 00 00 call 0x4012cb
0x401102 83 F8 00 cmp eax, 0x0
0x401105 0F 84 CF 02 00 00 je 0x4013da
0x40110b A3 04 10 40 00 mov [0x401004], eax ; wI
0x401110 68 4E 10 40 00 push 0x40104e ; VirtualProtect
0x401115 FF 35 00 10 40 00 push [0x401000]
0x40111b E8 AB 01 00 00 call 0x4012cb
0x401120 83 F8 00 cmp eax, 0x0
0x401123 0F 84 B1 02 00 00 je 0x4013da
0x401129 A3 08 10 40 00 mov [0x401008], eax
0x40112e 6A 00 push 0x0
0x401130 6A 00 push 0x0
0x401132 68 1C 10 40 00 push 0x40101c ; shell32.dll
0x401137 FF 15 04 10 40 00 call [0x401004] ; kernel32.dll!LoadLibraryA
0x40113d A3 0C 10 40 00 mov [0x40100c], eax
0x401142 68 33 10 40 00 push 0x401033 ; ShellExecuteA
0x401147 FF 35 0C 10 40 00 push [0x40100c]
0x40114d E8 79 01 00 00 call 0x4012cb
0x401152 A3 10 10 40 00 mov [0x401010], eax
Stores the address
into ptr [0x401004]
Loads LoadLibrary
from the PEB
Calls the new ptr
Header Imports
"ExitProcess"
"GetLastError"
"GetLocalTime"
"GetModuleHandleA"
Dynamic Imports
"LoadLibraryA"
"VirtualProtect"
"ShellExecuteA"
TEB & PEB
#[derive(Serialize, Deserialize)]
struct ThreadInformationBlock32
{
// reference: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
seh_frame: u32, //0x00
stack_base: u32, //0x04
stack_limit: u32, //0x08
subsystem_tib: u32, //0x0C
fiber_data: u32, //0x10
arbitrary_data: u32, //0x14
self_addr: u32, //0x18
//End of NT subsystem independent part
environment_ptr: u32, //0x1C
process_id: u32, //0x20
thread_id: u32, //0x24
active_rpc_handle: u32, //0x28
tls_addr: u32, //0x2C
peb_addr: u32, //0x30
last_error: u32, //0x34
critical_section_count: u32, //0x38
csr_client_thread: u32, //0x3C
win32_thread_info: u32, //0x40
win32_client_info: [u32; 31], //0x44
...
let teb_binary: Vec<u8> =
serialize(&teb_struct).unwrap();
In Rust, you can serialize structs into vectors
of bytes. This way you can allow the assembly
emulation to access them natively while also
managing the access.
PEB
peb_ldr_data
Entry 0: NTDLL
Entry 1: Kernel32
Entry N
Handling Branches & Calls
● Branches and calls have 2 directions
○ Left & Right
● In light emulation mode, both the left and right
directions are followed
● Each direction is placed onto a queue with it’s
own copy of the state.
● Any assembly not traversed will not be
analyzed.
● All function calls are tracked for local and
import table mapping.
Queue
State
Jump/
Call/
Branch
StateLEFT
RIGHT
Back
Front
Handling Looping
● Infinite loops are hard to avoid
● Built a way to configure the maximum amount
of loops a one can take
○ Forward
○ Backward
○ Standard Loop
● The state contains the looping information
● Once the maximum is reached, it will disable
the loop
"loop_default_case": 4000,
...
Configurable in xori.json
Automation for Bulk Analysis
● 4904 samples processed at 7.7 samples per second on dual 8-core E5-2650 Xeon w/ 2 threads per core
● Creates JSON output of important PE features from binary files allowing bulk data analysis: clustering, outlier detection and
visualization.
● You can then easily throw Xori output into a database, document store or do a little data science at the command line
$ jq '.import_table|map(.import_address_list)|map(.[].func_name)' *header.json |sort|uniq -c|sort -n
1662 "ExitProcess",
1697 "Sleep",
1725 "CloseHandle",
1863 "GetProcAddress",
1902 "GetLastError",
Examples
Cd ./xori
Cargo build --release
./target/release/xori -f wanacry.exe
Simplest Way to Run Xori
extern crate xori;
use std::fmt::Write;
use xori::disasm::*;
use xori::arch::x86::archx86::X86Detail;
fn main()
{
let xi = Xori { arch: Arch::ArchX86, mode: Mode::Mode32 };
let start_address = 0x1000;
let binary32 = b"xe9x1ex00x00x00xb8x04
x00x00x00xbbx01x00x00x00x59xbax0f
x00x00x00xcdx80xb8x01x00x00x00xbb
x00x00x00x00xcdx80xe8xddxffxffxff
x48x65x6cx6cx6fx2cx20x57x6fx72x6c
x64x21x0dx0a";
let mut vec: Vec<Instruction<X86Detail>> = Vec::new();
xi.disasm(binary32, binary32.len(),
start_address, start_address, 0, &mut vec);
if vec.len() > 0
{
//Display values
for instr in vec.iter_mut()
{
let addr: String = format!("0x{:x}", instr.address);
println!("{:16} {:20} {} {}", addr,
hex_array(&instr.bytes, instr.size),
instr.mnemonic, instr.op_str);
}
}
}
Basic Disassembler
extern crate xori;
extern crate serde_json;
use serde_json::Value;
use std::path::Path;
use xori::analysis::analyze::analyze;
use xori::disasm::*;
fn main()
{
let mut binary32 = b"xe9x1ex00x00x00xb8x04
x00x00x00xbbx01x00x00x00x59xbax0f
x00x00x00xcdx80xb8x01x00x00x00xbb
x00x00x00x00xcdx80xe8xddxffxffxff
x48x65x6cx6cx6fx2cx20x57x6fx72x6c
x64x21x0dx0a".to_vec();
let mut config_map: Option<Value> = None;
if Path::new("xori.json").exists()
{
config_map = read_config(&Path::new("xori.json"));
}
match analyze(&Arch::ArchX86, &mut binary32, &config_map)
{
Some(analysis)=>{
if !analysis.disasm.is_empty(){
println!("{}", analysis.disasm);
}
},
None=>{},
}
}
Binary File Disassembler
WanaCry Ransomware
Xori IDA Pro
WanaCry Ransomware
Xori Radare2
Demo
github.com/endgameinc/xori
@malwareunicorn
@rseymour

More Related Content

Finding Xori: Malware Analysis Triage with Automated Disassembly

  • 1. Finding Xori Malware Analysis Triage with Automated Disassembly Amanda Rousseau Rich Seymour
  • 2. About Us Amanda Rousseau Rich Seymour Sr. Malware Researcher, Endgame, Inc. @malwareunicorn Sr. Data Scientist, Endgame, Inc. @rseymour
  • 3. Quick Overview The Current State of Disassemblers Functionality & Features Brief overview of pros and cons with current popular open source PE disassemblers. Overview how we pulled together the different aspects of disassemblers and emulator Usage & Demo How the output is used for automation. Applying the tool on various malware samples and shellcode.
  • 4. The Problem There are millions of malware samples to look at and a few reverse engineers. We need to change the way we are going about this if we are going to keep up. How to leverage large scale disassembly in an automated way with many samples? ● Improve the scalability in malware analysis ● Integration and automation
  • 5. Present Day Common Disassemblers Capstone Radare2 IDA Pro Hopper Binary Ninja Size small small large medium large Stability ✔ ✖ ✔ ✔ ✔ Price - - $$$ $ $$ Cross Platform ✔ ~ ✔ ✖ ✔ Usability ~ ~ ✔ ~ ~ Accuracy ~ ~ ✔ ~ ~ Integration ✔ ~ ✖ ✖ ✖
  • 6. Requirements ● Fast development ● Stability and resilience ● Cross platform ● Output can be easily integrated ● Ease of use ● Core feature set ● Output accuracy
  • 7. Evaluating Disassemblers The first step - Diving into the code: ● Verifying the accuracy of various disassemblers ● Understand each of their strengths and limitations We adopted different aspects of disassemblers and emulator modules. ● Much of Capstone is also based on the LLVM & GDB repositories ● QEMU is the emulation is straightforward, easy to understand ● Converted some of the logic into Rust, while also fixing a few bugs along the way.
  • 8. Evaluating Example x66x90 XCHG AX, AX [Objdump]✔ X86 32bit: NOP [Capstone]✖ NOP [Distorm]✖ XCHG AX, AX [IDA Pro]✔ OpSize Opcode
  • 9. Developed in Rust Why Rust? ● Same capabilities in CC++ ● Stack protection ● Proper memory handling (guaranteed memory safety) ● Provides stability and speed (minimal runtime) ● Faster development ● Helpful compiler
  • 10. Current Features ● Open source ● Supports i386 and x86-64 architecture only at the moment ● Displays strings based on referenced memory locations ● Manages memory ● Outputs Json ● 2 modes: with or without emulation ○ Light Emulation - meant to enumerate all paths (Registers, Stack, Some Instructions) ○ Full Emulation - only follows the code’s path (Slow performance) ● Simulated TEB & PEB structures ● Evaluates functions based on DLL exports
  • 11. Design Memory Manager Image TEB PEB DLL headers Analysis Functions Disasm Imports PE Loader State CPU Registers & Flags Stack Loop Tracking Analysis This structure contains the CPU state of the registers & flags, a new copy of the stack, and short circuiting for looping during emulation. State Handles the loading of the PE image into memory and sets up the TEB/PEB as well as initializing the offsets to loaded DLLs and import table. PE Loader This structure contains all of the mmap memory for the Image, TEB/PEB, and DLL headers. Accessors for Read & Write to avoid errors in inaccessible memory. Memory Manager The core container for the disassembly, functions, and imports.
  • 12. Roll your own PE Parser ● Although a few Rust PE parsers exist: goblin, pe-rs we decided to create our own. ● Chose to write it using the nom parser combinator framework ● Ideally less error prone due to safe macro constructions ● Many lessons learned ● From a historical perspective a PE parser start reading a 16 bit DOS file ● Then optionally switches to a PE32 or a PE32+ ● This is like a history of DOS and Microsoft Windows in a single parser.
  • 13. Analysis Enrichment ● The header is used to build the memory sections of the PE Image ● Similar to the PE loader in windows, it will load the image similar to how it would be loaded in the addressable memory. Where the imports are given memory address, rewritten in the image. Image .text .data .idata .rsrc Stack DLLs TEB PEB
  • 14. Symbols ● We needed a way to load DLL exports and header information without doing it natively. ● Built a parser that would generate json files for consumption called pesymbols. ● Instead of relying on the Import Table of the PE, it generates virtual addresses of the DLL and API in the Image’s Import Table. This way you can track the actual address of the function being pushed into various registers. ● The virtual address start is configurable as well as the json location. { "name": "kernel32.dll", "exports": [ { "address": 696814, "name": "AcquireSRWLockExclusive", "ordinal": 1, "forwarder": true, "forwarder_name": "NTDLL.RtlAcquireSRWLockExclusive" }, { "address": 696847, "name": "AcquireSRWLockShared", "ordinal": 2, "forwarder": true, "forwarder_name": "NTDLL.RtlAcquireSRWLockShared" }, ... "dll_address32": 1691680768, 0x64D50000 "dll_address64": 8789194768384, 0x7FE64D50000 "function_symbol32": "./src/analysis/symbols/generated_user_syswow64.json", "function_symbol64": "./src/analysis/symbols/generated_user_system32.json", ... Configurable in xori.json Example generated_user_syswow64.json
  • 15. Dealing with Dynamic API Calls The Stack The TEB and PEB structures are simulated based on the the imports and known dlls in a windows 7 environment. TEB/PEB Segregated memory for the local memory storage such as the stack. Memory Management If references to functions are pushed into a register or stack will be able to be tracked.
  • 16. Dealing with Dynamic API Calls 0x4010ed A3 00 10 40 00 mov [0x401000], eax 0x4010f2 68 41 10 40 00 push 0x401041 ; LoadLibraryA 0x4010f7 FF 35 00 10 40 00 push [0x401000] 0x4010fd E8 C9 01 00 00 call 0x4012cb 0x401102 83 F8 00 cmp eax, 0x0 0x401105 0F 84 CF 02 00 00 je 0x4013da 0x40110b A3 04 10 40 00 mov [0x401004], eax ; wI 0x401110 68 4E 10 40 00 push 0x40104e ; VirtualProtect 0x401115 FF 35 00 10 40 00 push [0x401000] 0x40111b E8 AB 01 00 00 call 0x4012cb 0x401120 83 F8 00 cmp eax, 0x0 0x401123 0F 84 B1 02 00 00 je 0x4013da 0x401129 A3 08 10 40 00 mov [0x401008], eax 0x40112e 6A 00 push 0x0 0x401130 6A 00 push 0x0 0x401132 68 1C 10 40 00 push 0x40101c ; shell32.dll 0x401137 FF 15 04 10 40 00 call [0x401004] ; kernel32.dll!LoadLibraryA 0x40113d A3 0C 10 40 00 mov [0x40100c], eax 0x401142 68 33 10 40 00 push 0x401033 ; ShellExecuteA 0x401147 FF 35 0C 10 40 00 push [0x40100c] 0x40114d E8 79 01 00 00 call 0x4012cb 0x401152 A3 10 10 40 00 mov [0x401010], eax Stores the address into ptr [0x401004] Loads LoadLibrary from the PEB Calls the new ptr Header Imports "ExitProcess" "GetLastError" "GetLocalTime" "GetModuleHandleA" Dynamic Imports "LoadLibraryA" "VirtualProtect" "ShellExecuteA"
  • 17. TEB & PEB #[derive(Serialize, Deserialize)] struct ThreadInformationBlock32 { // reference: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block seh_frame: u32, //0x00 stack_base: u32, //0x04 stack_limit: u32, //0x08 subsystem_tib: u32, //0x0C fiber_data: u32, //0x10 arbitrary_data: u32, //0x14 self_addr: u32, //0x18 //End of NT subsystem independent part environment_ptr: u32, //0x1C process_id: u32, //0x20 thread_id: u32, //0x24 active_rpc_handle: u32, //0x28 tls_addr: u32, //0x2C peb_addr: u32, //0x30 last_error: u32, //0x34 critical_section_count: u32, //0x38 csr_client_thread: u32, //0x3C win32_thread_info: u32, //0x40 win32_client_info: [u32; 31], //0x44 ... let teb_binary: Vec<u8> = serialize(&teb_struct).unwrap(); In Rust, you can serialize structs into vectors of bytes. This way you can allow the assembly emulation to access them natively while also managing the access. PEB peb_ldr_data Entry 0: NTDLL Entry 1: Kernel32 Entry N
  • 18. Handling Branches & Calls ● Branches and calls have 2 directions ○ Left & Right ● In light emulation mode, both the left and right directions are followed ● Each direction is placed onto a queue with it’s own copy of the state. ● Any assembly not traversed will not be analyzed. ● All function calls are tracked for local and import table mapping. Queue State Jump/ Call/ Branch StateLEFT RIGHT Back Front
  • 19. Handling Looping ● Infinite loops are hard to avoid ● Built a way to configure the maximum amount of loops a one can take ○ Forward ○ Backward ○ Standard Loop ● The state contains the looping information ● Once the maximum is reached, it will disable the loop "loop_default_case": 4000, ... Configurable in xori.json
  • 20. Automation for Bulk Analysis ● 4904 samples processed at 7.7 samples per second on dual 8-core E5-2650 Xeon w/ 2 threads per core ● Creates JSON output of important PE features from binary files allowing bulk data analysis: clustering, outlier detection and visualization. ● You can then easily throw Xori output into a database, document store or do a little data science at the command line $ jq '.import_table|map(.import_address_list)|map(.[].func_name)' *header.json |sort|uniq -c|sort -n 1662 "ExitProcess", 1697 "Sleep", 1725 "CloseHandle", 1863 "GetProcAddress", 1902 "GetLastError",
  • 22. Cd ./xori Cargo build --release ./target/release/xori -f wanacry.exe Simplest Way to Run Xori
  • 23. extern crate xori; use std::fmt::Write; use xori::disasm::*; use xori::arch::x86::archx86::X86Detail; fn main() { let xi = Xori { arch: Arch::ArchX86, mode: Mode::Mode32 }; let start_address = 0x1000; let binary32 = b"xe9x1ex00x00x00xb8x04 x00x00x00xbbx01x00x00x00x59xbax0f x00x00x00xcdx80xb8x01x00x00x00xbb x00x00x00x00xcdx80xe8xddxffxffxff x48x65x6cx6cx6fx2cx20x57x6fx72x6c x64x21x0dx0a"; let mut vec: Vec<Instruction<X86Detail>> = Vec::new(); xi.disasm(binary32, binary32.len(), start_address, start_address, 0, &mut vec); if vec.len() > 0 { //Display values for instr in vec.iter_mut() { let addr: String = format!("0x{:x}", instr.address); println!("{:16} {:20} {} {}", addr, hex_array(&instr.bytes, instr.size), instr.mnemonic, instr.op_str); } } } Basic Disassembler
  • 24. extern crate xori; extern crate serde_json; use serde_json::Value; use std::path::Path; use xori::analysis::analyze::analyze; use xori::disasm::*; fn main() { let mut binary32 = b"xe9x1ex00x00x00xb8x04 x00x00x00xbbx01x00x00x00x59xbax0f x00x00x00xcdx80xb8x01x00x00x00xbb x00x00x00x00xcdx80xe8xddxffxffxff x48x65x6cx6cx6fx2cx20x57x6fx72x6c x64x21x0dx0a".to_vec(); let mut config_map: Option<Value> = None; if Path::new("xori.json").exists() { config_map = read_config(&Path::new("xori.json")); } match analyze(&Arch::ArchX86, &mut binary32, &config_map) { Some(analysis)=>{ if !analysis.disasm.is_empty(){ println!("{}", analysis.disasm); } }, None=>{}, } } Binary File Disassembler
  • 27. Demo