Malicious Client Detection Using Machine Learning
- 2. Threats
•There are many types of malware for all types of
devices and operating systems
•Most if not all malware relies on a support system –
command and control infrastructure
•Bad guys use DNS to scale and hide their C&C
infrastructure
•Bad guys use DNS for C&C to bypass corporate security
(tunneling)
•Bad guys use cloud providers to roll out, scale, manage
and quickly move their C&C Infrastructure
Without reliance on any particular end point operation
system or configuration, we can use big data analytics
on network data to detect malware.
- 3. Malware use of DNS
rndruppbakyokv[.]com
1.2.3.4
Command and
Control
Infrastructure
Communication
Chanel with C&C
is established.
Compromised device
receives updates,
instructions, targets.
DNS Server
DNS Server
End point device
- 6. DGA Model
• Detect Randomly generated domains in the pDNS data.
• Model is trained on 6 categories of malware families like zeus, tinba, pushdo, etc.
• 29 features extracted from the domain.
• 29 features dimensionally reduced to 16 features using PCA.
• Those reduced features set is then used to train a GBM classifier.
- 10. DGA Classification Performance
Overall model performance
(Random Forrest)
Metric Performance
Accuracy 98.738%
Precision 99.288%
Recall 98.181%
AUC 99.801%
Performance per malware family
Malware Family % Detection
Conflicker 86.309%
Cryptolocker 98.348%
Pushdo 95.515%
Ramdo 99.823%
Tinba 96.715%
Zeus 100.0%
- 11. Network Model
• Using WHOIS record to find if a domain is malicious or benign.
• WHOIS record contains very rich information about a domain.
• Age based features.
• Registration Features.
- 14. Network Model Performance
• Final Set of features :- creation Date, update Date, expiration Date,admin country, registrant
country, tech country, status, whois server
Metric Performance
Error 0.00450864127
Area Under Curve 0.96615884041