SlideShare a Scribd company logo
Practical Malware Analysis
Ch 13: Data Encoding
Revised 4-25-16
The Goal of Analyzing
Encoding Algorithms
Reasons Malware Uses Encoding
• Hide configuration information
– Such as C&C domains
• Save information to a staging file
– Before stealing it
• Store strings needed by malware
– Decode them just before they are needed
• Disguise malware as a legitimate tool
– Hide suspicious strings
Simple Ciphers
Why Use Simple Ciphers?
• They are easily broken, but
– They are small, so they fit into space-
constrained environments like exploit
shellcode
– Less obvious than more complex ciphers
– Low overhead, little impact on performance
• These are obfuscation, not encryption
– They make it difficult to recognize the data,
but can't stop a skilled analyst
Caesar Cipher
• Move each letter forward 3 spaces in the
alphabet
ABCDEFGHIJKLMNOPQRSTUVWXYZ
DEFGHIJKLMNOPQRSTUVWXYZABC
• Example
ATTACK AT NOON
DWWDFN DW QRRQ
XOR
• Uses a key to encrypt data
• Uses one bit of data and one bit of the
key at a time
• Example: Encode HI with a key of 0x3c
HI = 0x48 0x49 (ASCII encoding)
Data: 0100 1000 0100 1001
Key: 0011 1100 0011 1100
Result: 0111 0100 0111 0101
0 xor 0 = 0
0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0
Practical Malware Analysis Ch13
XOR Reverses Itself
• Example: Encode HI with a key of 0x3c
HI = 0x48 0x49 (ASCII encoding)
Data: 0100 1000 0100 1001
Key: 0011 1100 0011 1100
• Encode it again
Result: 0111 0100 0111 0101
Key: 0011 1100 0011 1100
Data: 0100 1000 0100 1001
0 xor 0 = 0
0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0
Brute-Forcing XOR Encoding
• If the key is a single byte, there are only
256 possible keys
– Error in book; this should be "a.exe"
– PE files begin with MZ
MZ = 0x4d 0x5a
Practical Malware Analysis Ch13
Link Ch 13a
Brute-Forcing Many Files
• Look for a
common
string, like
"This Program"
XOR and Nulls
• A null byte reveals the key, because
– 0x00 xor KEY = KEY
• Obviously the key here is 0x12
NULL-Preserving Single-Byte XOR
Encoding
• Algorithm:
– Use XOR encoding, EXCEPT
– If the plaintext is NULL or the key itself, skip
the byte
Practical Malware Analysis Ch13
Identifying XOR Loops in IDA Pro
• Small loops with an XOR instruction inside
1. Start in "IDA View" (seeing code)
2. Click Search, Text
3. Enter xor and Find all occurrences
Three Forms of XOR
• XOR a register with itself, like xor edx, edx
– Innocent, a common way to zero a register
• XOR a register or memory reference with a
constant
– May be an encoding loop, and key is the
constant
• XOR a register or memory reference with a
different register or memory reference
– May be an encoding loop, key less obvious
Practical Malware Analysis Ch13
Practical Malware Analysis Ch13
Base64
• Converts 6 bits into one character in a 64-
character alphabet
• There are a few versions, but all use these
62 characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789
• MIME uses + and /
– Also = to indicate padding
Practical Malware Analysis Ch13
Transforming Data to Base64
• Use 3-byte chunks (24 bits)
• Break into four 6-bit fields
• Convert each to Base64
base64encode.org

base64decode.org
• 3 bytes encode to 4
Base64 characters
Padding
• If input had only 2
characters, an = is
appended
Padding
• If input had only 1
character, == is
appended
Example
• URL and cookie are Base64-encoded
Cookie: Ym90NTQxNjQ
• This has 11
characters—
padding is omitted
• Some Base64
decoders will fail,
but this one just
automatically adds
the missing padding
Finding the Base64 Function
• Look for this "indexing string"
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi
jklmnopqrstuvwxyz0123456789+/
• Look for a lone padding character
(typically =) hard-coded into the encoding
function
Decoding the URLs
• Custom indexing string
aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijk
lmnopqrstuvwxyz0123456789+/
• Look for a lone padding character (typically
=) hard-coded into the encoding function
Practical Malware Analysis Ch13
Common Cryptographic
Algorithms
Strong Cryptography
• Strong enough to resist brute-force attacks
– Ex: SSL, AES, etc.
• Disadvantages of strong encryption
– Large cryptographic libraries required
– May make code less portable
– Standard cryptographic libraries are easily detected
• Via function imports, function matching, or identification of
cryptographic constants
– Symmetric encryption requires a way to hide the key
Recognizing Strings and Imports
• Strings found in malware encrypted with
OpenSSL
Recognizing Strings and Imports
• Microsoft crypto functions usually start
with Crypt or CP or Cert
Searching for Cryptographic Constants
• IDA Pro's FindCrypt2 Plug-in (Link Ch 13c)
– Finds magic constants (binary signatures of
crypto routines)
– Cannot find RC4 or IDEA routines because
they don't use a magic constant
– RC4 is commonly used in malware because it's
small and easy to implement
FindCrypt2
• Runs automatically on any new analysis
• Can be run manually from the Plug-In
Menu
Krypto ANALyzer (PEiD Plug-in)
• Download from link Ch 13d
• Has wider range of constants than FindCrypt2
– More false positives
• Also finds Base64 tables and crypto function
imports
Entropy
• Entropy measures disorder
• To calculate it, just count the number of
occurrences of each byte from 0 to 255
– Calculate Pi = Probability of value i
– Then sum Pi log( Pi) for I = 0 to 255 (Link 13e)
• If all the bytes are equally likely, the
entropy is 8 (maximum disorder)
• If all the bytes are the same, the entropy is
zero
Entropy Demo
• Put output in a file
• Use bin walk -E to analyze the file
• Multiply vertical axis by 8
41
#!/usr/bin/python
import base64, random
a = ''
for i in range(0, 10000):
a += chr(random.randint(0,255))
b = base64.b64encode(a)
c = base64.b32encode(a)
d = base64.b16encode(a)
e = 'A' * 10000
print a + b + c + d + e
Entropy Demo
• Concatenate three images in different
formats
42
Searching for High-Entropy Content
• IDA Pro Entropy Plugin
• Finds regions of high entropy, indicating
encryption (or compression)
Recommended Parameters
• Chunk size: 64 Max. Entropy: 5.95
– Good for finding many constants,
– Including Base64-encoding strings (entropy 6)
• Chunk size: 256 Max. Entropy: 7.9
– Finds very random regions
Entropy Graph
• IDA Pro Entropy Plugin
– Download from link Ch 13g
– Use StandAlone version
– Double-click region, then Calculate, Draw
– Lighter regions have high entropy
– Hover over graph to see numerical value
Practical Malware Analysis Ch13
Custom Encoding
Homegrown Encoding Schemes
• Examples
– One round of XOR, then Base64
– Custom algorithm, possibly similar to a
published cryptographic algorithm
Identifying Custom Encoding
• This sample makes a bunch of 700 KB files
• Figure out the encoding from the code
• Find CreateFileA and WriteFileA
– In function sub_4011A9
• Uses XOR with a pseudorandom stream
Practical Malware Analysis Ch13
Advantages of Custom Encoding to the
Attacker
• Can be small and nonobvious
• Harder to reverse-engineer
Decoding
Two Methods
• Reprogram the functions
• Use the functions in the malware itself
Self-Decoding
• Stop the malware in a debugger with data
decoded
• Isolate the decryption function and set a
breakpoint directly after it
• BUT sometimes you can't figure out how
to stop it with the data you need decoded
Manual Programming of Decoding
Functions
• Standard functions may be available
Practical Malware Analysis Ch13
PyCrypto Library
• Good for standard algorithms
How to Decrypt Using Malware
Practical Malware Analysis Ch13

More Related Content

Practical Malware Analysis Ch13

  • 1. Practical Malware Analysis Ch 13: Data Encoding Revised 4-25-16
  • 2. The Goal of Analyzing Encoding Algorithms
  • 3. Reasons Malware Uses Encoding • Hide configuration information – Such as C&C domains • Save information to a staging file – Before stealing it • Store strings needed by malware – Decode them just before they are needed • Disguise malware as a legitimate tool – Hide suspicious strings
  • 5. Why Use Simple Ciphers? • They are easily broken, but – They are small, so they fit into space- constrained environments like exploit shellcode – Less obvious than more complex ciphers – Low overhead, little impact on performance • These are obfuscation, not encryption – They make it difficult to recognize the data, but can't stop a skilled analyst
  • 6. Caesar Cipher • Move each letter forward 3 spaces in the alphabet ABCDEFGHIJKLMNOPQRSTUVWXYZ DEFGHIJKLMNOPQRSTUVWXYZABC • Example ATTACK AT NOON DWWDFN DW QRRQ
  • 7. XOR • Uses a key to encrypt data • Uses one bit of data and one bit of the key at a time • Example: Encode HI with a key of 0x3c HI = 0x48 0x49 (ASCII encoding) Data: 0100 1000 0100 1001 Key: 0011 1100 0011 1100 Result: 0111 0100 0111 0101 0 xor 0 = 0 0 xor 1 = 1 1 xor 0 = 1 1 xor 1 = 0
  • 9. XOR Reverses Itself • Example: Encode HI with a key of 0x3c HI = 0x48 0x49 (ASCII encoding) Data: 0100 1000 0100 1001 Key: 0011 1100 0011 1100 • Encode it again Result: 0111 0100 0111 0101 Key: 0011 1100 0011 1100 Data: 0100 1000 0100 1001 0 xor 0 = 0 0 xor 1 = 1 1 xor 0 = 1 1 xor 1 = 0
  • 10. Brute-Forcing XOR Encoding • If the key is a single byte, there are only 256 possible keys – Error in book; this should be "a.exe" – PE files begin with MZ
  • 11. MZ = 0x4d 0x5a
  • 14. Brute-Forcing Many Files • Look for a common string, like "This Program"
  • 15. XOR and Nulls • A null byte reveals the key, because – 0x00 xor KEY = KEY • Obviously the key here is 0x12
  • 16. NULL-Preserving Single-Byte XOR Encoding • Algorithm: – Use XOR encoding, EXCEPT – If the plaintext is NULL or the key itself, skip the byte
  • 18. Identifying XOR Loops in IDA Pro • Small loops with an XOR instruction inside 1. Start in "IDA View" (seeing code) 2. Click Search, Text 3. Enter xor and Find all occurrences
  • 19. Three Forms of XOR • XOR a register with itself, like xor edx, edx – Innocent, a common way to zero a register • XOR a register or memory reference with a constant – May be an encoding loop, and key is the constant • XOR a register or memory reference with a different register or memory reference – May be an encoding loop, key less obvious
  • 22. Base64 • Converts 6 bits into one character in a 64- character alphabet • There are a few versions, but all use these 62 characters: ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 • MIME uses + and / – Also = to indicate padding
  • 24. Transforming Data to Base64 • Use 3-byte chunks (24 bits) • Break into four 6-bit fields • Convert each to Base64
  • 25. base64encode.org
 base64decode.org • 3 bytes encode to 4 Base64 characters
  • 26. Padding • If input had only 2 characters, an = is appended
  • 27. Padding • If input had only 1 character, == is appended
  • 28. Example • URL and cookie are Base64-encoded
  • 29. Cookie: Ym90NTQxNjQ • This has 11 characters— padding is omitted • Some Base64 decoders will fail, but this one just automatically adds the missing padding
  • 30. Finding the Base64 Function • Look for this "indexing string" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi jklmnopqrstuvwxyz0123456789+/ • Look for a lone padding character (typically =) hard-coded into the encoding function
  • 31. Decoding the URLs • Custom indexing string aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijk lmnopqrstuvwxyz0123456789+/ • Look for a lone padding character (typically =) hard-coded into the encoding function
  • 34. Strong Cryptography • Strong enough to resist brute-force attacks – Ex: SSL, AES, etc. • Disadvantages of strong encryption – Large cryptographic libraries required – May make code less portable – Standard cryptographic libraries are easily detected • Via function imports, function matching, or identification of cryptographic constants – Symmetric encryption requires a way to hide the key
  • 35. Recognizing Strings and Imports • Strings found in malware encrypted with OpenSSL
  • 36. Recognizing Strings and Imports • Microsoft crypto functions usually start with Crypt or CP or Cert
  • 37. Searching for Cryptographic Constants • IDA Pro's FindCrypt2 Plug-in (Link Ch 13c) – Finds magic constants (binary signatures of crypto routines) – Cannot find RC4 or IDEA routines because they don't use a magic constant – RC4 is commonly used in malware because it's small and easy to implement
  • 38. FindCrypt2 • Runs automatically on any new analysis • Can be run manually from the Plug-In Menu
  • 39. Krypto ANALyzer (PEiD Plug-in) • Download from link Ch 13d • Has wider range of constants than FindCrypt2 – More false positives • Also finds Base64 tables and crypto function imports
  • 40. Entropy • Entropy measures disorder • To calculate it, just count the number of occurrences of each byte from 0 to 255 – Calculate Pi = Probability of value i – Then sum Pi log( Pi) for I = 0 to 255 (Link 13e) • If all the bytes are equally likely, the entropy is 8 (maximum disorder) • If all the bytes are the same, the entropy is zero
  • 41. Entropy Demo • Put output in a file • Use bin walk -E to analyze the file • Multiply vertical axis by 8 41 #!/usr/bin/python import base64, random a = '' for i in range(0, 10000): a += chr(random.randint(0,255)) b = base64.b64encode(a) c = base64.b32encode(a) d = base64.b16encode(a) e = 'A' * 10000 print a + b + c + d + e
  • 42. Entropy Demo • Concatenate three images in different formats 42
  • 43. Searching for High-Entropy Content • IDA Pro Entropy Plugin • Finds regions of high entropy, indicating encryption (or compression)
  • 44. Recommended Parameters • Chunk size: 64 Max. Entropy: 5.95 – Good for finding many constants, – Including Base64-encoding strings (entropy 6) • Chunk size: 256 Max. Entropy: 7.9 – Finds very random regions
  • 45. Entropy Graph • IDA Pro Entropy Plugin – Download from link Ch 13g – Use StandAlone version – Double-click region, then Calculate, Draw – Lighter regions have high entropy – Hover over graph to see numerical value
  • 48. Homegrown Encoding Schemes • Examples – One round of XOR, then Base64 – Custom algorithm, possibly similar to a published cryptographic algorithm
  • 49. Identifying Custom Encoding • This sample makes a bunch of 700 KB files • Figure out the encoding from the code • Find CreateFileA and WriteFileA – In function sub_4011A9 • Uses XOR with a pseudorandom stream
  • 51. Advantages of Custom Encoding to the Attacker • Can be small and nonobvious • Harder to reverse-engineer
  • 53. Two Methods • Reprogram the functions • Use the functions in the malware itself
  • 54. Self-Decoding • Stop the malware in a debugger with data decoded • Isolate the decryption function and set a breakpoint directly after it • BUT sometimes you can't figure out how to stop it with the data you need decoded
  • 55. Manual Programming of Decoding Functions • Standard functions may be available
  • 57. PyCrypto Library • Good for standard algorithms
  • 58. How to Decrypt Using Malware