SlideShare a Scribd company logo
Beyond Blacklists: Malicious Url Detection Using
Machine Learning
Who am I ?
• Info security Investigator @ Cisco.
• Completed Mtech from IIT Jodhpur in 2014.
• Areas of interest include machine learning,
computer vision and A.I.
• Email : satyamiitj89@gmail.com
Malicious websites
Phishing : which one is real ??
Visiting Malicious Websites
What we want ?
Problem in a Nutshell
6
 URL features to identify malicious Web sites
 No context, no content
 Different classes of URLs
 Benign, spam, phishing, exploits, scams...
 For now, distinguish benign vs. malicious
facebook.com fblight.com
Information about new websites
State of the Practice
8
 Current approaches
 Blacklists [SORBS, URIBL, SURBL, Spamhaus]
 Learning on hand-tuned features [Garera et al, 2007]
 Limitations
 Cannot predict unlisted sites
 Cannot account for new features
 Arms race: Fast feedback cycle is critical
More automated approach?
URL Classification System
9
Label Example Hypothesis
Data Sets
10
 Malicious URLs
 5,000 from PhishTank (phishing)
 15,000 from Spamscatter (spam, phishing, etc)
 Benign URLs
 15,000 from Yahoo Web directory
 15,000 from DMOZ directory
 Malicious x Benign → 4 Data Sets
 30,000 – 55,000 features per data set
Algorithms
11
 Logistic regression w/ L1-norm regularization
 Other models
 Naive Bayes
 Support vector machines (linear, RBF kernels)
 Implicit feature selection
 Easier to interpret
Feature vector construction
Features to consider?
14
1) Blacklists
2) Simple heuristics
3) Domain name registration
4) Host properties
5) Lexical
(1) Blacklist Queries
15
 List of known malicious sites
 Providers: SORBS, URIBL, SURBL,
Spamhaus
http://www.bfuduuioo1fp.mobi
In blacklist?
Yes
http://fblight.com
No
In blacklist?
http://www.bfuduuioo1fp.mobi
Blacklist queries as features
........................................
........................................
(2) Manually-Selected Features
16
 Considered by previous studies
 IP address in hostname?
 Number of dots in URL
 WHOIS (domain name) registration date
stopgap.cn registered 28
June 2009
http://72.23.5.122/www.bankofamerica.com/
http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
(3) WHOIS Features
17
 Domain name registration
 Date of registration, update, expiration
 Registrant: Who registered domain?
 Registrar: Who manages registration?
http://sleazysalmon.com
http://angryalbacore.com
http://mangymackerel.com
http://yammeringyellowtail.com
Registered on
29 June 2009
By SpamMedia
(4) Host-Based Features
18
 Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)
 WHOIS: registrar, registrant, dates
 IP address: Which ASes/IP prefixes?
 DNS: TTL? PTR record exists/resolves?
 Geography-related: Locale? Connection speed?
75.102.60.0/2269.63.176.0/20
facebook.com fblight.com
(5) Lexical Features
19
 Tokens in URL hostname + path
 Length of URL
 Entropy of the domain name
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
Which feature sets?
20
Blacklist
Manual
WHOIS
Host-based
Lexical
Full
w/o WHOIS/Blacklist
4,000
# Features
13,000
4
3
17,000
30,000
26,000
Beyond Blacklists
21
Blacklist
Full features
Yahoo-PhishTank
Higher detection rate for
given false positive rate
Limitations
22
 False positives
 Sites hosted in disreputable ISP
 Guilt by association
 False negatives
 Compromised sites
 Free hosting sites
 Hosted in reputable ISP
 Future work: Web page content
Conclusion
23
 Detect malicious URLs with high accuracy
 Only using URL
 Diverse feature set helps: 86.5% w/ 18,000+
features
 Proof concept working in lab
 Future work
 Scaling up for deployment
References
 Ma, Justin, et al. "Beyond blacklists: learning
to detect malicious web sites from suspicious
URLs." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge
discovery and data mining. ACM, 2009.
Q & A

More Related Content

What's hot

Seminar on Cyber Crime
Seminar on Cyber CrimeSeminar on Cyber Crime
Seminar on Cyber Crime
Likan Patra
 
Sms spam-detection
Sms spam-detectionSms spam-detection
Sms spam-detection
Tanvirul Islam
 
Spoofing
SpoofingSpoofing
Spoofing
Sanjeev
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
Partnered Health
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
Elaheh Rashedi
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
Nikhil Soni
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
m srikanth
 
Basics of Denial of Service Attacks
Basics of Denial of Service AttacksBasics of Denial of Service Attacks
Basics of Denial of Service Attacks
Hansa Nidushan
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
Cysinfo Cyber Security Community
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks ppt
Aryan Ragu
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
Santosh Kumar
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
Ankush Kr
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
ranjit banshpal
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
Ashish Arora
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
KumbidiGaming
 
Unit 2
Unit 2Unit 2
Unit 2
Jigarthacker
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
Pankaj Yadav
 
Phishing technology
Phishing technologyPhishing technology
Phishing technology
harpinderkaur123
 
Cyber attack
Cyber attackCyber attack
Cyber attack
Manjushree Mashal
 

What's hot (20)

Seminar on Cyber Crime
Seminar on Cyber CrimeSeminar on Cyber Crime
Seminar on Cyber Crime
 
Sms spam-detection
Sms spam-detectionSms spam-detection
Sms spam-detection
 
Spoofing
SpoofingSpoofing
Spoofing
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
 
Basics of Denial of Service Attacks
Basics of Denial of Service AttacksBasics of Denial of Service Attacks
Basics of Denial of Service Attacks
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks ppt
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
 
Unit 2
Unit 2Unit 2
Unit 2
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
 
Phishing technology
Phishing technologyPhishing technology
Phishing technology
 
Cyber attack
Cyber attackCyber attack
Cyber attack
 

Similar to Malicious Url Detection Using Machine Learning

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your Business
Imperva Incapsula
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
F _
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
Property Portal Watch
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
Distil Networks
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
tnooz
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
Distil Networks
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
G3 Communications
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
Distil Networks
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET Journal
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
IJCNCJournal
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
IJCNCJournal
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET Journal
 
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET Journal
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
Security Bootcamp
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
IRJET Journal
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
IRJET Journal
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
IOSRjournaljce
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
Property Portal Watch
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
Paul Bradshaw
 
Wsdm yu
Wsdm yuWsdm yu
Wsdm yu
Anarkii
 

Similar to Malicious Url Detection Using Machine Learning (20)

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your Business
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
 
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
 
Wsdm yu
Wsdm yuWsdm yu
Wsdm yu
 

More from securityxploded

Fingerprinting healthcare institutions
Fingerprinting healthcare institutionsFingerprinting healthcare institutions
Fingerprinting healthcare institutions
securityxploded
 
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive TacticsHollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
securityxploded
 
Buffer Overflow Attacks
Buffer Overflow AttacksBuffer Overflow Attacks
Buffer Overflow Attacks
securityxploded
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learning
securityxploded
 
Understanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case StudyUnderstanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case Study
securityxploded
 
Linux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon SandboxLinux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon Sandbox
securityxploded
 
Introduction to SMPC
Introduction to SMPCIntroduction to SMPC
Introduction to SMPC
securityxploded
 
Breaking into hospitals
Breaking into hospitalsBreaking into hospitals
Breaking into hospitals
securityxploded
 
Bluetooth [in]security
Bluetooth [in]securityBluetooth [in]security
Bluetooth [in]security
securityxploded
 
Basic malware analysis
Basic malware analysisBasic malware analysis
Basic malware analysis
securityxploded
 
Automating Malware Analysis
Automating Malware AnalysisAutomating Malware Analysis
Automating Malware Analysis
securityxploded
 
Reverse Engineering Malware
Reverse Engineering MalwareReverse Engineering Malware
Reverse Engineering Malware
securityxploded
 
DLL Preloading Attack
DLL Preloading AttackDLL Preloading Attack
DLL Preloading Attack
securityxploded
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryption
securityxploded
 
Hunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of MemoryHunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of Memory
securityxploded
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bullet
securityxploded
 
Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)
securityxploded
 
Hunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory ForensicsHunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory Forensics
securityxploded
 
Anatomy of Exploit Kits
Anatomy of Exploit KitsAnatomy of Exploit Kits
Anatomy of Exploit Kits
securityxploded
 
MalwareNet Project
MalwareNet ProjectMalwareNet Project
MalwareNet Project
securityxploded
 

More from securityxploded (20)

Fingerprinting healthcare institutions
Fingerprinting healthcare institutionsFingerprinting healthcare institutions
Fingerprinting healthcare institutions
 
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive TacticsHollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
 
Buffer Overflow Attacks
Buffer Overflow AttacksBuffer Overflow Attacks
Buffer Overflow Attacks
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learning
 
Understanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case StudyUnderstanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case Study
 
Linux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon SandboxLinux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon Sandbox
 
Introduction to SMPC
Introduction to SMPCIntroduction to SMPC
Introduction to SMPC
 
Breaking into hospitals
Breaking into hospitalsBreaking into hospitals
Breaking into hospitals
 
Bluetooth [in]security
Bluetooth [in]securityBluetooth [in]security
Bluetooth [in]security
 
Basic malware analysis
Basic malware analysisBasic malware analysis
Basic malware analysis
 
Automating Malware Analysis
Automating Malware AnalysisAutomating Malware Analysis
Automating Malware Analysis
 
Reverse Engineering Malware
Reverse Engineering MalwareReverse Engineering Malware
Reverse Engineering Malware
 
DLL Preloading Attack
DLL Preloading AttackDLL Preloading Attack
DLL Preloading Attack
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryption
 
Hunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of MemoryHunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of Memory
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bullet
 
Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)
 
Hunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory ForensicsHunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory Forensics
 
Anatomy of Exploit Kits
Anatomy of Exploit KitsAnatomy of Exploit Kits
Anatomy of Exploit Kits
 
MalwareNet Project
MalwareNet ProjectMalwareNet Project
MalwareNet Project
 

Recently uploaded

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 

Recently uploaded (20)

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 

Malicious Url Detection Using Machine Learning

  • 1. Beyond Blacklists: Malicious Url Detection Using Machine Learning
  • 2. Who am I ? • Info security Investigator @ Cisco. • Completed Mtech from IIT Jodhpur in 2014. • Areas of interest include machine learning, computer vision and A.I. • Email : satyamiitj89@gmail.com
  • 3. Malicious websites Phishing : which one is real ??
  • 6. Problem in a Nutshell 6  URL features to identify malicious Web sites  No context, no content  Different classes of URLs  Benign, spam, phishing, exploits, scams...  For now, distinguish benign vs. malicious facebook.com fblight.com
  • 8. State of the Practice 8  Current approaches  Blacklists [SORBS, URIBL, SURBL, Spamhaus]  Learning on hand-tuned features [Garera et al, 2007]  Limitations  Cannot predict unlisted sites  Cannot account for new features  Arms race: Fast feedback cycle is critical More automated approach?
  • 10. Data Sets 10  Malicious URLs  5,000 from PhishTank (phishing)  15,000 from Spamscatter (spam, phishing, etc)  Benign URLs  15,000 from Yahoo Web directory  15,000 from DMOZ directory  Malicious x Benign → 4 Data Sets  30,000 – 55,000 features per data set
  • 11. Algorithms 11  Logistic regression w/ L1-norm regularization  Other models  Naive Bayes  Support vector machines (linear, RBF kernels)  Implicit feature selection  Easier to interpret
  • 13. Features to consider? 14 1) Blacklists 2) Simple heuristics 3) Domain name registration 4) Host properties 5) Lexical
  • 14. (1) Blacklist Queries 15  List of known malicious sites  Providers: SORBS, URIBL, SURBL, Spamhaus http://www.bfuduuioo1fp.mobi In blacklist? Yes http://fblight.com No In blacklist? http://www.bfuduuioo1fp.mobi Blacklist queries as features ........................................ ........................................
  • 15. (2) Manually-Selected Features 16  Considered by previous studies  IP address in hostname?  Number of dots in URL  WHOIS (domain name) registration date stopgap.cn registered 28 June 2009 http://72.23.5.122/www.bankofamerica.com/ http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
  • 16. (3) WHOIS Features 17  Domain name registration  Date of registration, update, expiration  Registrant: Who registered domain?  Registrar: Who manages registration? http://sleazysalmon.com http://angryalbacore.com http://mangymackerel.com http://yammeringyellowtail.com Registered on 29 June 2009 By SpamMedia
  • 17. (4) Host-Based Features 18  Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)  WHOIS: registrar, registrant, dates  IP address: Which ASes/IP prefixes?  DNS: TTL? PTR record exists/resolves?  Geography-related: Locale? Connection speed? 75.102.60.0/2269.63.176.0/20 facebook.com fblight.com
  • 18. (5) Lexical Features 19  Tokens in URL hostname + path  Length of URL  Entropy of the domain name http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
  • 19. Which feature sets? 20 Blacklist Manual WHOIS Host-based Lexical Full w/o WHOIS/Blacklist 4,000 # Features 13,000 4 3 17,000 30,000 26,000
  • 20. Beyond Blacklists 21 Blacklist Full features Yahoo-PhishTank Higher detection rate for given false positive rate
  • 21. Limitations 22  False positives  Sites hosted in disreputable ISP  Guilt by association  False negatives  Compromised sites  Free hosting sites  Hosted in reputable ISP  Future work: Web page content
  • 22. Conclusion 23  Detect malicious URLs with high accuracy  Only using URL  Diverse feature set helps: 86.5% w/ 18,000+ features  Proof concept working in lab  Future work  Scaling up for deployment
  • 23. References  Ma, Justin, et al. "Beyond blacklists: learning to detect malicious web sites from suspicious URLs." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
  • 24. Q & A