Hitbkl 2012

Messing up with Kids playground:
Eradicating easy targets

Yarochkin Fyodor @fygrave
Vladimir Kropotov @vbkropotov

Presented at HITBKL 2012

agenda

Introduction (cybecrime 2012 – russian style :)
Detecting malicious network infrastructure
Getting one-step-ahead
Conclusions

DCCrime-2012: Brief Introduction
● Bots and Botnets – still popular :)
● Monetization schemes vary.
● DbD is one of the most common attack vectors
– We also have email
– We also have stupid users downloading sh*t
– Mobile is lucrative target (all your money are
there)

DCCrime-2012: Introduction

“Traffic” - is still an important component in the process :)

Main “components” to deal with
● Callback nodes (aka C&C)
● Traffic:
– Compromised machines/or manipulated content
– Banner networks
– SEO (doorways)

What's new this year?
● Automated detection gets difficult. (anti-
sandboxing, anti-crawler tricks)
function() {
var url = 'http://yyzola.gpbbsdhmjm.shacknet.nu/g/';
…
document.onmousemove = function() {
…
● In some cases of idiocy, human interaction is a
must..
● Mobile phone as the most common means of
funds transfer

Mobile scams
● Fake apps are still big
● Android apps avail :)

So really, how easy it is to get pwned
In Russia? :)

● So the focus of this research:
– Identifying “bad kids” playground – mapping
infrastructure, identifying potential targets,
attempting to fix the problems, before “things hit
hard”

Detecting malicious network
infrastructure

DNS: (did u see this morning
passive DNS talk? ;-))

With a spike of generative domain botnets, this
seems like interesting research project

DGAs produce very specific pattern in DNS
traffic

Is this the only method to call back?

Nop..

Domain generative bots
● C&C is not hardcoded to maintain flexibility in
cases when C&C is taken down.
● Some sort of algorithm is used to generate
domain names
● Domains are tested for validity. IP address is
obtained.
● Sometimes obfuscation involved. (for example:
manipulations applied to resolved IP address)

C&C/generative domains and
pattern mining
● Generative-domain name based domains
generate very specific voluminous DNS traffic
● Our research is primarily focused on picking up
these patterns. Example Carberp (details
provided by Vladimir Kropotov)

Carberp
● Bot Infection: Drive-By-HTTP
● Payload and intermediate malware domains: normal,
recent registration dates or DynDNS
● Distributed via: Many many compromised web-sites, top
score > 100 compromised resources detected during 1
week.
● C&C domains usually generated, but some special cases
below ;-).
● C&C and Malware domains located on the same AS (from
bot point of view). Easy to detect.
● Typical bot activity: Mass HTTP Post

Size Payload Referrer URL Domain
9414 javascript www.*****press.ru /g/18418362672595167.js beatshine.is-
saved.org
45443 html www.*****press.ru /index.php? activatedreplacing.
28d9000e56c2a63080ff89c is-very-evil.org
6f5357591
4135 application/x //images/r/785cee8be7f1da activatedreplacing.
-jar 9a9d60820cbf8b1840.jar is-very-evil.org
155529 application/e /server_privileges.php? activatedreplacing.
xecutable 91370f5f009a815950578cb is-very-evil.org
539f28b58=3

Size Payload Referrer URL Domain
997 html Infected site /1/s.html 3645455029
4923 javascript 3645455029 /js/deployJava.js Java.com
18046 application/x /1/exp.jar 3645455029
-jar
138352 application/e /file1.dat 3645455029
xecutable

Detection: related works
From Throw-Away Traffic to Bots: Detecting
Rise of DGA-Based Malware (Manos
Antonakakis, Roberto Redisci et al) (2012)
L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi.
EXPOSURE: Finding malicious domains using
passive dns analysis. In Proceedings of NDSS,
2011
etc..

What we do differently:
● “lazy” WHOIS lookups, team cymru IP to ASN
lookups
● Our own passive DNS index
● Sandbox farm (mainly to detect compromised
websites automagically and study behavior)

Dealing with false positives: filtering
● Generated sequences: n-gram analysis

● WHOIS cross-ref (if available)
● Ips belong to Malicious ASN
● Public domain lists (alexa top 100k) works well
as whitelist

Cat and mouse game
● Of course all of this is easy to evade. Once you
know the method. But security is always about
'cat-n-mouse' game ;-)

Architecture
● What we are building ;)

Are we using signatures?
Yes and No..
● We don't have signatures for C&C domains..
● But we maintain patterns for suspicious whois
data (registration date, registrar, email, ..)
● Historical DNS and AS association (bad IP)
● Generic patterns for generative domains (high,
similarly distributed pattern of failed lookups
within the same zone)

A walk through automated detection
● In this example we will show how automated
detection works step by step. We will show
redis queries in form of interactive session:

Detection starting point: rcode: 3
(Non-existing domains)
12

10

8

Column 1
6 Column 2
Column 3

4

2

0
Row 1 Row 2 Row 3 Row 4

Rcode:2 domains
Detection: rcode:2 (server failure)
(failed servers)

Sample analysis (step by step)
● Start looking for a failed pattern and cluster id:

Sample analysis (two)
● Get the cluster ID: (eu_11_14)

Clustering is based on domain similarity. Currently used characteristics:
- f(zone, pattern (length, depth))
- additional characteristics (building up): natural language domain vs. generated string
(occurrence of two-character sequences - n-grams)
- domain registration parameters (obtained via WHOIS [ problematic! ] )
- cross-reference with existing malicious IP and AS reputation database (incrementally
built by us)

Sample analysis
● Get other members of the cluster

Sample analysis
● Find common members (notice avatarmaker.eu
could be a false positive, easily filtered out
through common denominator filering (IP,
WHOIS information)

Sample analysis
● So we have C&C IP 66.175.210.173
● we can continue mining to see if we get any
other domain names:

Sample analysis
●
Look! We just met an old friend!!

Mapping C&C (easily automated)

● http://cihunemyror.eu/login.php
● http://foxivusozuc.eu/login.php
● http://ryqecolijet.eu/login.php
● http://xuqohyxeqak.eu/login.php
● http://foqaqehacew.eu/login.php
● http://jecijyjudew.eu/login.php
● http://voworemoziv.eu/login.php
● http://mamixikusah.eu/login.php
● http://qebahilojam.eu/login.php
● http://foqaqehacew.eu/search.php
● http://foqaqehacew.eu/search.php
● http://foqaqehacew.eu/LMvg9Ng1d.php

Sample analysis
● Finding more relevant domains:

Performance
● On single machine (32Gb RAM) we run up to
2000 pkt/sec without significant performance
loss
● Average load:

Other Interesting numbers
● Packets per day: ~130M filtered.
● Mal. Domains/day: ~30k DNS queries (varies)
● Avg. 30-50 req/minute for single domain
●

Uses of the data
● Obvious: blacklists
● Botnet take overs (costs 11USD or less ;)
● Sinkholing

Detection
● (demos, lets look at some videos :)

What could be more flux than
fastflux? ;-)
● WHOIS fastflux … HOW?!

Domain ID:D166393631-LROR
Domain Name:FOOTBALL-SECURITY-
WETRLSGPIEO.ORG
Created On:21-Aug-2012 01:23:52 UTC
Last Updated On:21-Aug-2012 01:23:53 UTC
Expiration Date:21-Aug-2013 01:23:52 UTC
Sponsoring Registrar:Click Registrar, Inc. d/b/a
publicdomainregistry.com (R1935-LROR)
Status:CLIENT TRANSFER PROHIBITED
Status:TRANSFER PROHIBITED
Status:ADDPERIOD
Registrant ID:PP-SP-001
Registrant Name:Domain Admin
Registrant Organization:PrivacyProtect.org
Registrant Street1:ID#10760, PO Box 16
Registrant Street2:Note - All Postal Mails Rejected, visit
Privacyprotect.org

Moving ahead:
Finding easy targets before they do :)

In short, it is all about quick ways of finding idiots
having no clue of what they are doing with
wordpress, oscommerce, openx, [put yer fave]
And forcing them to update before they get owned
;)
And hmm.. doing it country-wide

disclaimer

Just another “small data” project we play with.
Around 4 machines solr cluster.

Largely inspired by “Fruit: why so low?” by Adam
MetlStorm (hack.lu 2011)

Scanning internet is not new..
but pretty much realistic

Architecture
● Network port discovery (agents)
● Banner collection (agents)
● Backend Store: SOLR
● Collectibles: services and ports, OS fingerprints,
● ASN/OWNER/netblock/Country, geographical
location/App data

Architecture(2)
● Roughly something like that

Approach
● Scan slow (avoid abuse reports)
● Index time
● Passive “mapper” (simple sniffer + browser
fingerprinting at the moment)
● Larger range of ports (account port numbers, which
are actively being scanned from firewall log
analysis, honeypot machines etc)
● For web apps – (wafp fingerprinting) + index banner
(noisy, cause of most of the abuse complaints)

Features
● Scriptable via restful API (think of solr) (cuz UI
is for sissies ;-))
● Query by any combination of:
– software version/banner regex (solr/lucene style)
– geospatial search (via geohash)
– ASN or regex on ASN owner
– Country code

Uses
CERT team: automated notifications of idiots
running old wordpress within particular range,
geographic location or organization is a one
liner script

Questions

@fygrave
@vbkropotov

Hitbkl 2012

Related slideshows

More Related Content

Hitbkl 2012