Hitbkl 2012
- 1. Messing up with Kids playground:
Eradicating easy targets
Yarochkin Fyodor @fygrave
Vladimir Kropotov @vbkropotov
Presented at HITBKL 2012
- 3. DCCrime-2012: Brief Introduction
● Bots and Botnets – still popular :)
● Monetization schemes vary.
● DbD is one of the most common attack vectors
– We also have email
– We also have stupid users downloading sh*t
– Mobile is lucrative target (all your money are
there)
- 5. Main “components” to deal with
● Callback nodes (aka C&C)
● Traffic:
– Compromised machines/or manipulated content
– Banner networks
– SEO (doorways)
- 6. What's new this year?
● Automated detection gets difficult. (anti-
sandboxing, anti-crawler tricks)
function() {
var url = 'http://yyzola.gpbbsdhmjm.shacknet.nu/g/';
…
document.onmousemove = function() {
…
● In some cases of idiocy, human interaction is a
must..
● Mobile phone as the most common means of
funds transfer
- 9. ● So the focus of this research:
– Identifying “bad kids” playground – mapping
infrastructure, identifying potential targets,
attempting to fix the problems, before “things hit
hard”
- 11. DNS: (did u see this morning
passive DNS talk? ;-))
With a spike of generative domain botnets, this
seems like interesting research project
DGAs produce very specific pattern in DNS
traffic
- 14. Domain generative bots
● C&C is not hardcoded to maintain flexibility in
cases when C&C is taken down.
● Some sort of algorithm is used to generate
domain names
● Domains are tested for validity. IP address is
obtained.
● Sometimes obfuscation involved. (for example:
manipulations applied to resolved IP address)
- 16. C&C/generative domains and
pattern mining
● Generative-domain name based domains
generate very specific voluminous DNS traffic
● Our research is primarily focused on picking up
these patterns. Example Carberp (details
provided by Vladimir Kropotov)
- 17. Carberp
● Bot Infection: Drive-By-HTTP
● Payload and intermediate malware domains: normal,
recent registration dates or DynDNS
● Distributed via: Many many compromised web-sites, top
score > 100 compromised resources detected during 1
week.
● C&C domains usually generated, but some special cases
below ;-).
● C&C and Malware domains located on the same AS (from
bot point of view). Easy to detect.
● Typical bot activity: Mass HTTP Post
- 18. Size Payload Referrer URL Domain
9414 javascript www.*****press.ru /g/18418362672595167.js beatshine.is-
saved.org
45443 html www.*****press.ru /index.php? activatedreplacing.
28d9000e56c2a63080ff89c is-very-evil.org
6f5357591
4135 application/x //images/r/785cee8be7f1da activatedreplacing.
-jar 9a9d60820cbf8b1840.jar is-very-evil.org
155529 application/e /server_privileges.php? activatedreplacing.
xecutable 91370f5f009a815950578cb is-very-evil.org
539f28b58=3
- 19. Size Payload Referrer URL Domain
997 html Infected site /1/s.html 3645455029
4923 javascript 3645455029 /js/deployJava.js Java.com
18046 application/x /1/exp.jar 3645455029
-jar
138352 application/e /file1.dat 3645455029
xecutable
- 20. Detection: related works
From Throw-Away Traffic to Bots: Detecting
Rise of DGA-Based Malware (Manos
Antonakakis, Roberto Redisci et al) (2012)
L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi.
EXPOSURE: Finding malicious domains using
passive dns analysis. In Proceedings of NDSS,
2011
etc..
- 21. What we do differently:
● “lazy” WHOIS lookups, team cymru IP to ASN
lookups
● Our own passive DNS index
● Sandbox farm (mainly to detect compromised
websites automagically and study behavior)
- 22. Dealing with false positives: filtering
● Generated sequences: n-gram analysis
● WHOIS cross-ref (if available)
● Ips belong to Malicious ASN
● Public domain lists (alexa top 100k) works well
as whitelist
- 23. Cat and mouse game
● Of course all of this is easy to evade. Once you
know the method. But security is always about
'cat-n-mouse' game ;-)
- 25. Are we using signatures?
Yes and No..
● We don't have signatures for C&C domains..
● But we maintain patterns for suspicious whois
data (registration date, registrar, email, ..)
● Historical DNS and AS association (bad IP)
● Generic patterns for generative domains (high,
similarly distributed pattern of failed lookups
within the same zone)
- 26. A walk through automated detection
● In this example we will show how automated
detection works step by step. We will show
redis queries in form of interactive session:
- 27. Detection starting point: rcode: 3
(Non-existing domains)
12
10
8
Column 1
6 Column 2
Column 3
4
2
0
Row 1 Row 2 Row 3 Row 4
- 30. Sample analysis (two)
● Get the cluster ID: (eu_11_14)
Clustering is based on domain similarity. Currently used characteristics:
- f(zone, pattern (length, depth))
- additional characteristics (building up): natural language domain vs. generated string
(occurrence of two-character sequences - n-grams)
- domain registration parameters (obtained via WHOIS [ problematic! ] )
- cross-reference with existing malicious IP and AS reputation database (incrementally
built by us)
- 32. Sample analysis
● Find common members (notice avatarmaker.eu
could be a false positive, easily filtered out
through common denominator filering (IP,
WHOIS information)
- 33. Sample analysis
● So we have C&C IP 66.175.210.173
● we can continue mining to see if we get any
other domain names:
- 36. Mapping C&C (easily automated)
● http://cihunemyror.eu/login.php
● http://foxivusozuc.eu/login.php
● http://ryqecolijet.eu/login.php
● http://xuqohyxeqak.eu/login.php
● http://foqaqehacew.eu/login.php
● http://jecijyjudew.eu/login.php
● http://voworemoziv.eu/login.php
● http://mamixikusah.eu/login.php
● http://qebahilojam.eu/login.php
● http://foqaqehacew.eu/search.php
● http://foqaqehacew.eu/search.php
● http://foqaqehacew.eu/LMvg9Ng1d.php
- 40. Performance
● On single machine (32Gb RAM) we run up to
2000 pkt/sec without significant performance
loss
● Average load:
- 41. Other Interesting numbers
● Packets per day: ~130M filtered.
● Mal. Domains/day: ~30k DNS queries (varies)
● Avg. 30-50 req/minute for single domain
●
- 42. Uses of the data
● Obvious: blacklists
● Botnet take overs (costs 11USD or less ;)
● Sinkholing
- 44. What could be more flux than
fastflux? ;-)
● WHOIS fastflux … HOW?!
Domain ID:D166393631-LROR
Domain Name:FOOTBALL-SECURITY-
WETRLSGPIEO.ORG
Created On:21-Aug-2012 01:23:52 UTC
Last Updated On:21-Aug-2012 01:23:53 UTC
Expiration Date:21-Aug-2013 01:23:52 UTC
Sponsoring Registrar:Click Registrar, Inc. d/b/a
publicdomainregistry.com (R1935-LROR)
Status:CLIENT TRANSFER PROHIBITED
Status:TRANSFER PROHIBITED
Status:ADDPERIOD
Registrant ID:PP-SP-001
Registrant Name:Domain Admin
Registrant Organization:PrivacyProtect.org
Registrant Street1:ID#10760, PO Box 16
Registrant Street2:Note - All Postal Mails Rejected, visit
Privacyprotect.org
- 46. In short, it is all about quick ways of finding idiots
having no clue of what they are doing with
wordpress, oscommerce, openx, [put yer fave]
And forcing them to update before they get owned
;)
And hmm.. doing it country-wide
- 47. disclaimer
Just another “small data” project we play with.
Around 4 machines solr cluster.
Largely inspired by “Fruit: why so low?” by Adam
MetlStorm (hack.lu 2011)
- 49. Architecture
● Network port discovery (agents)
● Banner collection (agents)
● Backend Store: SOLR
● Collectibles: services and ports, OS fingerprints,
● ASN/OWNER/netblock/Country, geographical
location/App data
- 51. Approach
● Scan slow (avoid abuse reports)
● Index time
● Passive “mapper” (simple sniffer + browser
fingerprinting at the moment)
● Larger range of ports (account port numbers, which
are actively being scanned from firewall log
analysis, honeypot machines etc)
● For web apps – (wafp fingerprinting) + index banner
(noisy, cause of most of the abuse complaints)
- 53. Features
● Scriptable via restful API (think of solr) (cuz UI
is for sissies ;-))
● Query by any combination of:
– software version/banner regex (solr/lucene style)
– geospatial search (via geohash)
– ASN or regex on ASN owner
– Country code
- 54. Uses
CERT team: automated notifications of idiots
running old wordpress within particular range,
geographic location or organization is a one
liner script