Malicious Client Detection Using Machine Learning

Malicious Client
Detection using
machine learning
SATYAM SAXENA

Threats
•There are many types of malware for all types of
devices and operating systems
•Most if not all malware relies on a support system –
command and control infrastructure
•Bad guys use DNS to scale and hide their C&C
infrastructure
•Bad guys use DNS for C&C to bypass corporate security
(tunneling)
•Bad guys use cloud providers to roll out, scale, manage
and quickly move their C&C Infrastructure
Without reliance on any particular end point operation
system or configuration, we can use big data analytics
on network data to detect malware.

Malware use of DNS
rndruppbakyokv[.]com
1.2.3.4
Command and
Control
Infrastructure
Communication
Chanel with C&C
is established.
Compromised device
receives updates,
instructions, targets.
DNS Server
DNS Server
End point device

Raw
pDNS
Domain Name
classifier
DNS Resolver
classifier
Device Behavior
classifier
Compromised Device
(Security Event)
classifier
Malicious
Domains
Malicious
Resolvers
Behavior
Anomalies
Machine Learning Pipeline
DGA Network Time
Tunnel
Network Time
Network Time

DGA Model
• Detect Randomly generated domains in the pDNS data.
• Model is trained on 6 categories of malware families like zeus, tinba, pushdo, etc.
• 29 features extracted from the domain.
• 29 features dimensionally reduced to 16 features using PCA.
• Those reduced features set is then used to train a GBM classifier.

Domain Features
Common Letter Score Entropy

Domain Features(2)
Length of largest meaningful string Mean length of dictionary words

DGA Classification Performance
Overall model performance
(Random Forrest)
Metric Performance
Accuracy 98.738%
Precision 99.288%
Recall 98.181%
AUC 99.801%
Performance per malware family
Malware Family % Detection
Conflicker 86.309%
Cryptolocker 98.348%
Pushdo 95.515%
Ramdo 99.823%
Tinba 96.715%
Zeus 100.0%

Network Model
• Using WHOIS record to find if a domain is malicious or benign.
• WHOIS record contains very rich information about a domain.
• Age based features.
• Registration Features.

Network Features – Whois Server
Malicious Domains Benign Domains

Network Features – creation Date

Network Model Performance
• Final Set of features :- creation Date, update Date, expiration Date,admin country, registrant
country, tech country, status, whois server
Metric Performance
Error 0.00450864127
Area Under Curve 0.96615884041

Compromised Client Detection
Hadoop
HDFS
Spark
Compute
IP DGA WHOIS NX SERVER
ip1 #10 #3 #4 #5
Ip2 #8 #1 #2 #3
ip3 #5 #2 #0 #0
ip4 #3 #3 #0 #0
pDNS
Data
Group
By

Malicious Client Detection Using Machine Learning

Related slideshows

More Related Content

Malicious Client Detection Using Machine Learning