13

When examining bin firmware files Binwalk is an extremely helpful tool. There are times though that Binwalk comes up empty and a lot more digging is required to make sense of the data.

Are there any alternatives to Binwalk that might work better in certain cases, or possibly a commercial version of such a tool?

5
  • 3
    Binwalk's weakness is that it is signature based. What kind of functionality do you seek? Code segment detection? Visualization? ISA identification? One needs to utilize a combination of tools to make sense of an unknown binary. There is no single tool that does everything, and sometimes there is so little information available in the binary that not much can be done with any tool.
    – julian
    Commented Jan 23, 2018 at 23:24
  • @SYS_V thanks for bringing up those different stages, if you would elaborate on them and mention tools that can be used to help in those stages i would select that as a correct answer
    – pzirkind
    Commented Jan 24, 2018 at 2:00
  • 1
    this showed up in my twitter feed today: github.com/attify/firmware-analysis-toolkit
    – julian
    Commented Jan 29, 2018 at 22:33
  • @SYS_V Binwalk's weakness is that it is signature based. out of curiosity if you want to re-write binwalk what features you think that binwalk lacking and would implement it, and what the opposite of signature based method?
    – user22363
    Commented Jul 4, 2018 at 12:44
  • 1
    @user22363 To answer your question, I designed a tool called centrifuge that performs analysis using statistics and machine learning algorithms rather than scanning for signatures.
    – julian
    Commented Aug 18, 2020 at 23:14

4 Answers 4

20

2020-08: For more up-to-date information, see the answer below discussing ISAdetect and Centrifuge


2024-06:


The tools themselves are less important than the approach to the analysis. Instead of looking for better or more tools, seek to develop a sound methodology to employ when analyzing binaries.

I'm an amateur (a student) and can't claim to know much, having started experimenting with firmware analysis around March 2017, so take what I write here with a grain of salt. But by basing the way I approach firmware analysis challenges on how professionals do it and drawing on methods employed in data science when analyzing new and unfamiliar data sets, the results have generally been good, even with simple tools. You don't have to take my word for it; feel free to look at the firmware analyses contributed here and make your own determination.

Here are 2 exemplars:

  1. lzma: File format not recognized [Details enclosed]
  2. Approach to extract useful information from binary file

Here is a summary of a possible approach:

1. Visualization

Visualization is the fastest way to determine if a binary is compressed or encrypted. If a binary is compressed or encrypted, not much else can be done until it is decompressed/decrypted. See this question for an example of how someone reasonably skilled and experienced wasted time analyzing an encrypted binary and getting nowhere, simply because they did not realize that the binary was in fact encrypted: Disassembling VxWorks Firmware

Use binvis.io and binwalk -E to visualize the structure of the binary and its entropy levels. This alone will reveal how the binary is organized, and whether it is compressed/encrypted. Areas containing code typically have higher entropy than areas not containing code and this will show up in an entropy scan. Data is often repetitive and has low entropy. Entropy level visualization is very useful because it can reveal if there is no object code in a binary whatsoever.

2. Exploration

In general, it is only after it has been established that there is at least some accessible information available in a binary that it makes sense to go further. How long is it reasonable to stare at an encrypted blob? Anyway, at this juncture several things can be done:

  1. Perform signature scan using binwalk

  2. Perform an opcode scan using binwalk -A. Most malware target x86 or x86-64 architectures, but most firmware binaries target MIPS or ARM CPUs as far as I can tell. There are many different architectures out there for embedded devices such as PowerPC, AVR, Xtensa, s390, sh4, Sparc, and so on. In addition to all of these different architectures that object code in firmware may target, it may be the case that there is no object code present at all, so an opcode scan will only get you so far, since binwalk only scans for a handful of architectures.

    Note that no publicly available tool currently exists that can, with a high level of accuracy, not only identify the presence of object code within a binary and contiguous regions of code but also identify the instruction set architecture (ISA) of the code. This is the subject of research and part of the Praetorian Machine Learning Challenge. In lieu of such a tool, binwalk -A is just about it.

  3. strings will often turn up interesting data that a signature scan will not.

  4. If I have reason to believe that the firmware was developed by developers whose machines use a Unicode-encoded character set, I supplement strings with radare2's search functionality.

  5. hexdump -C can be used to quickly explore a header structure, if present, as well as seek to interesting structures elsewhere in the binary

3. Analysis

At this point it has been established that the binary contains accessible information that merits analysis. This can include interesting data structures such as headers as well as extracted data such as kernels and file systems and/or object code that can be disassembled.

For situations in which there is a clear-text header structure followed by a compressed block for which binwalk does not detect a signature, a hex editor such as wxHexEditor can be very useful. Good examples of how a hex editor can aid in analysis are provided by @ebux, a professional security researcher:

If it is believed that object code is present but the CPU/architecture of the device is not known, the architecture will need to be identified before the code can be disassembled. While not very exciting, if the developer provides technical documentation, it is at this point which it will need to be read, not just to identify the CPU but also to discover the base address of the firmware image so that when the ISA is identified the image can be correctly disassembled using IDA or radare2.

Approaches to identifying binary ISAs range from simple statistical methods, such as examining byte n-gram frequencies to more sophisticated machine learning-based methods that are discussed in detail here:

Summary

Arsenal:

  • binwalk + plugins
  • binvis.io
  • strings
  • hexdump
  • wxHexEditor
  • radare2
  • IDA
  • technical reference manuals
  • statistics and machine learning
2
  • 1
    thank you! i hope this helps lots of people that are starting out
    – pzirkind
    Commented Jan 24, 2018 at 13:57
  • 2
    @pzirkind no problem, I hope so too. The fastest way to develop is through hands on experience. Good luck.
    – julian
    Commented Jan 24, 2018 at 13:58
3

You can try binaryanalysis maybe it can help

3
  • How can it 'maybe' help? What are the advantages over binwalk?
    – Jongware
    Commented Jan 24, 2018 at 10:10
  • Advantage is that it has own "magic bytes" list
    – Vido
    Commented Jan 24, 2018 at 11:52
  • 1
    Dead as of 2023-11-05. Some checking via the Internet Archive says that the work continues in github.com/armijnhemel/binaryanalysis-ng
    – Adam
    Commented Nov 5, 2023 at 8:54
3

The original answer I posted in 2018 is somewhat out of date now. There are 2 tools that have been released in the meantime that can help with understanding what is in a binary file. One tool, ISAdetect, focuses specifically on identifying the CPU the code in an executable binary targets. It accomplishes this using machine learning.

Another tool, Centrifuge, also uses machine learning, but does not focus on machine code specifically. Rather, this tool was designed to help an analyst identify what kinds of data are encoded in binary files (full disclosure, I am the creator of this tool). To that end, it provides many functions for visualizing the data in a binary file using Python plotting libraries, and finds clusters of statistically-similar data by using scikit-learn's implementation of the DBSCAN algorithm. Centrifuge also uses ISAdetect's web API to identify any machine code found in a binary file.

Here are some examples of visualizations Centrifuge can create from data in a binary file:

readelf clusters

firmware machine code

AVR clusters boxplot

As you can see from these images, the approach taken by the tool is statistical. It is through statistical analysis of the data in a file that Centrifuge is able to identify what types of data may be present. At time of writing, 3 different data types can be identified: machine code, UTF-english, and compression/encryption.

As an example of this, here is the output for a firmware binary analyzed by Centrifuge:

Searching for machine code
--------------------------------------------------------------------

[+] Checking Cluster 0 for possible match
[+] Closely matching CPU architecture reference(s) found for Cluster 0
[+] Sending sample to https://isadetect.com/
[+] response:

{
    "prediction": {
        "architecture": "mips",
        "endianness": "little",
        "wordsize": 32
    },
    "prediction_probability": 0.93
}


Searching for utf8-english data
-------------------------------------------------------------------

[+] UTF-8 (english) detected in Cluster 1
    Wasserstein distance to reference: 7.861589780632858


Searching for high entropy data
-------------------------------------------------------------------

[+] High entropy data found in Cluster 2
    Wasserstein distance to reference: 0.4625352842771307
[*] This distance suggests the data in this cluster could be
    a) encrypted
    b) compressed via LZMA with maximum compression level
    c) something else that is random or close to random.

For context, here is a visualization of the information of the same binary:

firmware clusters

For those who are interested, here is a notebook explaining how to use it: Introduction to Centrifuge.

1
  • 1
    appreciate the updates to this question, this is very useful
    – pzirkind
    Commented Aug 19, 2020 at 13:34
2

There's a cloud version of binwalk (binwalk pro) where you just upload the firmware and it unpacks. Supports more file systems than the open source version. Less buggy too. Developed by Craig Heffner, creator of binwalk.

Not the answer you're looking for? Browse other questions tagged or ask your own question.