Rebuilding Web Tracking
Infrastructure for Scale
Stephen Oakley
Principal Engineer
What is Marketo?
What is Web Tracking at Marketo?
• Ingest web page visits and clicks on customer’s website
• Trigger campaigns in response to web activity
• Trigger real-time personalization of web experience
• Provide lead level analytics for known leads
• Provide aggregate analytics for all lead activity
• Typically known leads < 10 % of all traffic
Legacy Web Tracking Infrastructure

Legacy Web Tracking Infrastructure
Legacy Problems
• Throughput limitations – 2 million activities per day
• Processing delays can be on the order of hours
• Large customers cause web server brownouts
• Web reporting does not scale
• Fixed-sized clusters prohibit horizontal scaling
• Brittle infrastructure prevents feature development
The Vision
Orion Initiative
• Increase scale to support IoT for Marketers
• Support billions of marketing activities each day
• Trigger on activities in near real time (< 2 minute @ 99th %)
• Reduce operational costs
• Improve multitenancy and QoS

Business Requirements
• 200 MM activities per customer per day
• Near real-time web activity processing (SLA of < 1
minute lag)
• Improve cost efficiency
• Improve flexibility for feature enhancements
Technical Requirements
• Multitenancy support with brownout protections
• Infrastructure must scale horizontally
• Decouple web processing from downstream processing
• Anonymous leads should cost next to nothing to track
Architecture & Design

Why Hbase + Phoenix?
• Horizontally scalable
• Leverages the Hadoop cluster for storage and scaling
• Provides secondary indices for query patterns through
• Natural integration with JDBC and Spark JDBC RDDs
Marketo Lambda Architecture
Spark Streaming
Campaign Triggers
Solr Indexing
Spark Streaming Indexer
Ingestion Processor
CRM Sync
Partner APIs
Other Marketing
Web Activity
RTP Activity
Mobile Activity
Marketo UI
Campaign Detail
Lead Detail
Other Clients
CRM Sync
Revenue Cycle Analylitcs
Email Report Loader
Web Activity Processor
Why Spark Streaming?
• Micro-batching provides sink-side efficiencies
• This is especially important with MySQL touchpoints
• Great integration with Kafka
• No strict real-time processing requirements
• Great community and industry adoption
• One topic per customer (sized by volume)
• Traffic storms are isolated to a single customer
• Fairness/throttling is easy to control
• Spark Streaming job consumes from many topics
• Allows us to turn a customer off under error conditions
• See “Elastic Streaming” by Neelesh Shastry –
Spark Summit
Making Spark Streaming Performant
• Coalesce small partitions for the same customer
• Aggressive caching of metadata (mostly from MySQL)
• Heavily leverage Scala future composition for parallelism
• Persist RDDs that are used for multiple outputs
• e.g. write to Kafka and Activity Service

Making Anonymous Traffic Cheap
• High costs of web traffic in legacy system
• MySQL storage for all traffic
• Down streaming processing of all events (even anonymous)
• V2 only processes and stores known traffic in MySQL
• Defer triggering for anonymous data until promotion
• Rolled out to our highest volume customers
• Processing latencies < 30s (at 99.9th %)
• Allowed key customers to scale from ~2MM/day to > 20
Impact and Results
• Mitigations of straggler effects on processing delays
• Adding sessionization for web reporting
• Scaling Kafka topics as customers increase volume
• Globally distributed ingestion for a single customer
Future Work
We’re Hiring!
Q & A

