SlideShare a Scribd company logo
Cleaning up website traffic from
bots & spammers
TLearn webinar
24 February 2015
Your hosts
Kevin May
Editor & Moderator
Tnooz
Nick Vivion
Reporter & Global Events Lead
Tnooz
Speakers
Rami Essaid
Distil Networks
CEO & Co-founder
Rob Gennaro
Red Label Vacations
Digital Marketing Officer
Poll no. 1
Has your website been a victim to web scraping
and/or bad bot attacks in the last 12 months?
Poll no. 2
Do you currently have a bot defense
solution in place?
Agenda
The growing bot problem
Web scraping bots and your data
Is web scraping legal?
Red Label Vacations journey to clean traffic
Selection criteria for a bot detection solution
Distil Networks overview
How Big is the Problem?
Up to 38% of traffic on travel websites are Bad Bots
4.2 million IP addresses impacted by “Pushdo” botnet alone
15% bot traffic can equate to hitting each of your pricing pages
30 times per month
Why the Massive Increase in Bot Traffic?
Online data has increased in value
Pricing, incentive packages, flight routes, reviews,
images, star ratings, availability, hotel
names/descriptions, and editorial are changing
daily
Anyone can get in the game
Cheap or free virtual servers, bandwidth, easy-to-
use tools, and scrapers for hire
Bots no longer tied to IP addresses
Bots cycle through random IP addresses
Bots hide behind anonymous proxies
Consumer IPs now infected with bot traffic too
What Is Web Scraping?
Web Scraping
Also known as screen scraping, web scraping is the act of
copying large amounts of data from a website – either
manually or with an automated program (Bot)
Legitimate Scraping
Scraping can sometimes be benevolent and totally
acceptable. For example, the search engine bots that index
your website
Malicious Scraping
A systematic theft of intellectual property accessible on a
website, including pricing, content, images, and proprietary
data
What Are Scrapers Doing with Your Travel Site?
Posting your content on competitor
sites
Scrapers steal your traffic and advertising
dollars. Duplicative content and high bounce
rates diminishes your SEO
Undermining your prices
Bots monitor your prices, ensuring competitors
can undercut with lower price listings
Executing searches on your site
The resulting API calls to third parties can cost
you
Bots Impact Your Website and Bottom Line
○ Cause brownouts
○ Undermine pricing strategies
○ Damage the human visitor’s experience
○ Negatively impact revenues
○ Waste advertising spend
○ Lower site quality score and hurt SEO
○ Hijack accounts
Bots Amplify Website Security Breaches
Is Web Scraping Legal?
Is the Legal Route Effective?
Hard to prosecute scrapers
No easy way to detect or identify stolen data in derivatives
Legal route is too expensive
Travel website’s legal bill for one “scraper” > $10M
Copyrights and terms of use don’t have teeth
Easy for thieves to assert plausible deniability
Big Brands Already at Risk
About Red Label Vacations
○ Largest independent travel company
in Canada
○ 19 brands
○ Deals on vacations, flights, cruises,
hotels and car rentals
Canada’s premier online travel agency service
offering cheap flights, airline tickets, last
minute vacation packages and discounted
cruises
Red Label Vacations Complex IT Infrastructure
Complex IT Infrastructure
○ Total of 19 different web properties
○ 5 servers for RedTag.ca
○ 5 servers for ITA (for flight technology)
○ API calls into ITA, Sabre, Softvoyage, Hotels.com, Cartrawler
○ Mixed web infrastructure environments (outsourced hosting,
owned data centers)
○ Mix of web application stacks (e.g., .NET, PHP)
○ Akamai CDN
Red Label Vacations Bot Challenges
Bot Challenges
○ Homegrown, IP blocking system wasn’t working
○ Bots came in through proxies; IP addresses were spoofed
○ Bots caused brownouts
○ Brownouts caused immediate loss of revenue ($1000s)
○ Bots can hurt Google quality score and SEO
○ Akamai CDN was difficult to manage
Red Label Vacations Selection Criteria
Bot Detection and Mitigation Solution Requirements
○ Block web scrapers without impacting human visitors
○ Accurately identify good bots vs. bad bots
○ Increase website availability and speed
○ Detect automated browsing tools
○ Simple setup
○ Little or no maintenance, “self-optimizing” solution
○ Reduce costs and complexity of Akamai
Red Label Vacations Results with Distil
Improved Website Performance with Distil
○ Uptime went from 99.6 to to 99.9%
○ Faster load times; no errors
○ User time on site increased; bounce rate
decreased
○ Detailed reporting distinguishes human visitors
from malicious bots
Red Label Vacations Results with Distil
Monthly Cost Savings with Distil
○ 65% less expensive than Akamai
○ Reduced costs for third party API calls
○ Cost savings due to improved uptime
○ Eliminated tax on internal teams
Red Label Vacations Traffic Overview
Turing Tested, No False Positives
Visibility into Bot-Laden Advertising Networks
Selection Criteria: Purpose-Built Solution
Bot Detection is a New Category, NOT a Feature
○ NOT a Content Delivery Service (CDN)
○ NOT a Distributed Denial of Service (DDoS) protection solution
○ NOT a Web Application Firewall (WAF)
○ NOT a simple IP list or set of scripts
A purpose-built bot detection solution is
always updating and evolving
Selection Criteria: Complete Protection
Internal Teams Catch 20%
IP BLOCK
USER AGENT
TESTING
IP ANALYSIS
USER AGENT
TESTING
JAVASCRIPT
TEST
COOKIE
SELENIUM TEST
BROWSER RATE
LIMITING
AUTOMATED
BROWSER
PHANTOM JS
MACHINE
LEARNING
IP CYCLING
A Purpose-Built Solution Should Catch 99.9%
Selection Criteria: No Impact on Human Visitors
IP Based/WAF Purpose Built
Selection Criteria: Accuracy
Inline Fingerprinting
Fingerprints stick to the bot even if it attempts to
reconnect from random IP addresses or hide behind
an anonymous proxy
Known Violators Database
Real-time updates from a Known Violators Database,
which is based on the collective intelligence of all
protected sites
Behavioral Modeling and Machine
Learning
Machine-learning algorithms pinpoint behavioral
anomalies specific to your site’s unique traffic
patterns
Selection Criteria: Accuracy
Browser Automation Tool Detection
JavaScript Validation on the connection stream
identifies browser automation tools
Advanced Rate Limiting
Set rate limits such as pages per minute, pages per
session, and session length
“Good Bot” Authentication
Validate that good bot requests (Google, Bing, etc.)
map to the correct user agent and IP range
How Travel Companies Benefit from Distil
Increase insight & control
over human, good bot &
bad bot traffic
Block 99.9% of malicious
bots without impacting
legitimate users
Slash the high tax bots
place on internal teams
& web infrastructure
Protect data from web
scrapers, unauthorized
aggregators & hackers
www.distilnetworks.com/trial/
Promo Code: TLearn15
Offer Ends March 10th
One Month of Free Service + Traffic Analysis
How to clean up travel website traffic from bots and spammers?
Thank you!
Send your questions and comments to
kevin@tnooz.com
Replay and presentation of webinar will be available on
www.tnooz.com

More Related Content

How to clean up travel website traffic from bots and spammers?

  • 1. Cleaning up website traffic from bots & spammers TLearn webinar 24 February 2015
  • 2. Your hosts Kevin May Editor & Moderator Tnooz Nick Vivion Reporter & Global Events Lead Tnooz
  • 3. Speakers Rami Essaid Distil Networks CEO & Co-founder Rob Gennaro Red Label Vacations Digital Marketing Officer
  • 4. Poll no. 1 Has your website been a victim to web scraping and/or bad bot attacks in the last 12 months?
  • 5. Poll no. 2 Do you currently have a bot defense solution in place?
  • 6. Agenda The growing bot problem Web scraping bots and your data Is web scraping legal? Red Label Vacations journey to clean traffic Selection criteria for a bot detection solution Distil Networks overview
  • 7. How Big is the Problem? Up to 38% of traffic on travel websites are Bad Bots 4.2 million IP addresses impacted by “Pushdo” botnet alone 15% bot traffic can equate to hitting each of your pricing pages 30 times per month
  • 8. Why the Massive Increase in Bot Traffic? Online data has increased in value Pricing, incentive packages, flight routes, reviews, images, star ratings, availability, hotel names/descriptions, and editorial are changing daily Anyone can get in the game Cheap or free virtual servers, bandwidth, easy-to- use tools, and scrapers for hire Bots no longer tied to IP addresses Bots cycle through random IP addresses Bots hide behind anonymous proxies Consumer IPs now infected with bot traffic too
  • 9. What Is Web Scraping? Web Scraping Also known as screen scraping, web scraping is the act of copying large amounts of data from a website – either manually or with an automated program (Bot) Legitimate Scraping Scraping can sometimes be benevolent and totally acceptable. For example, the search engine bots that index your website Malicious Scraping A systematic theft of intellectual property accessible on a website, including pricing, content, images, and proprietary data
  • 10. What Are Scrapers Doing with Your Travel Site? Posting your content on competitor sites Scrapers steal your traffic and advertising dollars. Duplicative content and high bounce rates diminishes your SEO Undermining your prices Bots monitor your prices, ensuring competitors can undercut with lower price listings Executing searches on your site The resulting API calls to third parties can cost you
  • 11. Bots Impact Your Website and Bottom Line ○ Cause brownouts ○ Undermine pricing strategies ○ Damage the human visitor’s experience ○ Negatively impact revenues ○ Waste advertising spend ○ Lower site quality score and hurt SEO ○ Hijack accounts
  • 12. Bots Amplify Website Security Breaches
  • 13. Is Web Scraping Legal?
  • 14. Is the Legal Route Effective? Hard to prosecute scrapers No easy way to detect or identify stolen data in derivatives Legal route is too expensive Travel website’s legal bill for one “scraper” > $10M Copyrights and terms of use don’t have teeth Easy for thieves to assert plausible deniability
  • 16. About Red Label Vacations ○ Largest independent travel company in Canada ○ 19 brands ○ Deals on vacations, flights, cruises, hotels and car rentals Canada’s premier online travel agency service offering cheap flights, airline tickets, last minute vacation packages and discounted cruises
  • 17. Red Label Vacations Complex IT Infrastructure Complex IT Infrastructure ○ Total of 19 different web properties ○ 5 servers for RedTag.ca ○ 5 servers for ITA (for flight technology) ○ API calls into ITA, Sabre, Softvoyage, Hotels.com, Cartrawler ○ Mixed web infrastructure environments (outsourced hosting, owned data centers) ○ Mix of web application stacks (e.g., .NET, PHP) ○ Akamai CDN
  • 18. Red Label Vacations Bot Challenges Bot Challenges ○ Homegrown, IP blocking system wasn’t working ○ Bots came in through proxies; IP addresses were spoofed ○ Bots caused brownouts ○ Brownouts caused immediate loss of revenue ($1000s) ○ Bots can hurt Google quality score and SEO ○ Akamai CDN was difficult to manage
  • 19. Red Label Vacations Selection Criteria Bot Detection and Mitigation Solution Requirements ○ Block web scrapers without impacting human visitors ○ Accurately identify good bots vs. bad bots ○ Increase website availability and speed ○ Detect automated browsing tools ○ Simple setup ○ Little or no maintenance, “self-optimizing” solution ○ Reduce costs and complexity of Akamai
  • 20. Red Label Vacations Results with Distil Improved Website Performance with Distil ○ Uptime went from 99.6 to to 99.9% ○ Faster load times; no errors ○ User time on site increased; bounce rate decreased ○ Detailed reporting distinguishes human visitors from malicious bots
  • 21. Red Label Vacations Results with Distil Monthly Cost Savings with Distil ○ 65% less expensive than Akamai ○ Reduced costs for third party API calls ○ Cost savings due to improved uptime ○ Eliminated tax on internal teams
  • 22. Red Label Vacations Traffic Overview
  • 23. Turing Tested, No False Positives
  • 24. Visibility into Bot-Laden Advertising Networks
  • 25. Selection Criteria: Purpose-Built Solution Bot Detection is a New Category, NOT a Feature ○ NOT a Content Delivery Service (CDN) ○ NOT a Distributed Denial of Service (DDoS) protection solution ○ NOT a Web Application Firewall (WAF) ○ NOT a simple IP list or set of scripts A purpose-built bot detection solution is always updating and evolving
  • 26. Selection Criteria: Complete Protection Internal Teams Catch 20% IP BLOCK USER AGENT TESTING IP ANALYSIS USER AGENT TESTING JAVASCRIPT TEST COOKIE SELENIUM TEST BROWSER RATE LIMITING AUTOMATED BROWSER PHANTOM JS MACHINE LEARNING IP CYCLING A Purpose-Built Solution Should Catch 99.9%
  • 27. Selection Criteria: No Impact on Human Visitors IP Based/WAF Purpose Built
  • 28. Selection Criteria: Accuracy Inline Fingerprinting Fingerprints stick to the bot even if it attempts to reconnect from random IP addresses or hide behind an anonymous proxy Known Violators Database Real-time updates from a Known Violators Database, which is based on the collective intelligence of all protected sites Behavioral Modeling and Machine Learning Machine-learning algorithms pinpoint behavioral anomalies specific to your site’s unique traffic patterns
  • 29. Selection Criteria: Accuracy Browser Automation Tool Detection JavaScript Validation on the connection stream identifies browser automation tools Advanced Rate Limiting Set rate limits such as pages per minute, pages per session, and session length “Good Bot” Authentication Validate that good bot requests (Google, Bing, etc.) map to the correct user agent and IP range
  • 30. How Travel Companies Benefit from Distil Increase insight & control over human, good bot & bad bot traffic Block 99.9% of malicious bots without impacting legitimate users Slash the high tax bots place on internal teams & web infrastructure Protect data from web scrapers, unauthorized aggregators & hackers
  • 31. www.distilnetworks.com/trial/ Promo Code: TLearn15 Offer Ends March 10th One Month of Free Service + Traffic Analysis
  • 33. Thank you! Send your questions and comments to kevin@tnooz.com Replay and presentation of webinar will be available on www.tnooz.com

Editor's Notes

  1. Tor
  2. Travel package margins are tight All of their prices should be on par Competitors will drop their prices on packages based on time of day Late at night Reviews Diminishing data quality
  3. Waste advertising spend - bots hit google adwords, click fraud report Hurts SEO 1,000 visits and time on site is low, high bounce rate, google’s algorithm, lowers your quality score Had a form on their site and it became a spam haven…
  4. Ryan Air lost their suit in Europe In the US it’s illegal In emerging markets It Depends Canada
  5. Akamai is good at being a CDN, DDoS Disitl’s main focus was a CDN Looking for something for something to block bots visiting the site Everyone says they offer the next greatest thing Notices right away that
  6. University or Company that shares the same NAT as the person launching the Bot