44CON 2014: Using hadoop for malware, network, forensics and log analysis

Using Hadoop for Malware,
Network, Forensics and Log
analysis
● Michael Boman
● michael@michaelboman.org
● http://blog.michaelboman.org
● @mboman

Background
● 44CON 2012 – Malware analysis as a
hobby
● DEEPSEC 2012 – Malware analysis on a
shoe-string budget
● DEEPSEC 2013 - Malware Datamining and
Attribution

VirusShare Malware Collection
7000
6000
5000
4000
3000
2000
1000
0
VirusShare
5.8 TByte
Total Size (GB)
2012-01-01 2014-07-21

What is Hadoop?
● Distributed processing of large data sets
(“Big Data”)
● Runs on of-the-shelf hardware
● Runs from a single node to thousands of
machines
● High failure tolerance
– “Hardware is crappy and will fail”

Hadoop components
Data Access
Data Storage
Interaction
Visualization
Execution
Development
Data Serialization
Data Intelligence
Java Virtual Machine
Operating system (Redhat, Ubuntu, Windows etc.)
Data Integration
Sqoop Flume Chukwa
HDFS
(distributed storage)
Map Reduce
(distributed processing)
YARN
(Distributed
Scheduling)
Pig Hive
HBase
Cassandra
HCatalog
Lucene
Hama
Crunch
Avro
Thrift
Drill
Mahout
Mgmnt, Monitoring, Orchestration
Ambari Zookeeper Oozie

How to obtain your Hadoop
infrastructure (examples)
● Pre-packaged “distributions”
– Cloudera
– Hortonworks
● Rent
– Amazon Web Services
● Roll your own
– Compile from source

Malware Analysis - BinaryPig
● Creates large archives of individual
samples on HDFS as key/value sets
(samples are small, HDFS likes them big)
● Static analysis in done in batch
● Results are stored in ElasticSearch for
easy access/further analysis

Malware Analysis - BinaryPig
● Extracting resource information
● AV-(re)scanning
● Scanning samples with new/updated Yara
signatures

How does it work?
ZIP-archive /
local dir
Binarypig
Sequence file

How does it work?
Sequence files stored
in HDFS

How does it work?
Pig-scripts for:
Hashes
ClamAV
Yara
Strings

Network Analysis - PacketPig
● PCAP in HDFS
● Detecting anomalies and intrusion
signatures
● Learn time frame and identity of attacker
● Triage incidents
● “Show me packet captures I’ve never seen
before.”

How does it work?
PCAP are created locally
and uploaded to HDFS

How does it work?
PCAP uploaded to HDFS

How does it work?
Pig Scripts for
snort signatures
P0f
User-agent extraction
What-ever you want

Computer Forensics - Sleuth Kit
Hadoop Framework
● Uses both HDFS and HBase to store file
information
● Ingest
● Analysis
● Reporting

How does it work?
Fsrip dumps information
about image and
information
about files in the image

How does it work?
RAW disk image file is
uploaded to HDFS

How does it work?
Populates HBase entries
table with information from
the hard drive files.

How does it work?
Extract raw filedata
Keyword search
Extracts text
Tokenize
Cluster similar objects
Compare with other image

How does it work?
Build a report from
previous steps

Log Analysis
● FLUME-agents push local logs to HDFS.
● Pig scripts process data on schedule.
Results from Pig are stored in HDFS /
HBase.
● HBase will have the data processed by Pig
ready for reporting or further analysis.
● Data interaction/extraction using REST
services.

How does it work?
FLUME-agents push
local logs to HDFS

How does it work?
Pig-scripts extracts
data and puts them
into HBase

How does it work?
Pig-scripts can perform
additional analysis on
HBase data

How do I do it?
● Store malware samples locally
● Upload samples to analyze to S3
● Run EMR on samples on S3
● Download the results from S3 to local

Saving money
● Samples stored locally and backed up on
Amazon Glacier.
● Use reduced redundancy storage on S3
– 99.99% instead of 99.999999999%
● Spot-bid on EC2 instances for EMR
– ~$0.011 instead of $0.052
● My AWS cost is expecting to be about
$20/month

Conclusions
● Malware Analysis
● Network Analysis
● Computer Forensics
● Log Analysis

Questions?
● michael@michaelboman.org
● @michael
● http://blog.michaelboman.org

44CON 2014: Using hadoop for malware, network, forensics and log analysis

Related slideshows

More Related Content

44CON 2014: Using hadoop for malware, network, forensics and log analysis

Editor's Notes