SlideShare a Scribd company logo
November 13, 2014 | Las Vegas, NV 
Eddie Fagin, VP Engineering, MediaMath 
Ian Hummel, Sr. Director Engineering, MediaMath 
Adi Krishnan, Sr. PM Amazon Kinesis
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
Canonical Data Flow With Amazon Kinesis
•Query Engine Approach 
•Pre-computations such as indices and dimensional views improve performance 
•Historical, structured data 
•Amazon Redshift 
•HIVE/SQL-on-Hadoop/ M-R/ Spark 
•Batch programs, or other abstractions breaking down into MR style computations 
•Historical, Semi-structured data 
•Amazon EMR 
•Custom computations of relative simple complexity 
•Continuous Processing –filters, sliding windows, aggregates –on infinite data streams 
•Semi/Structured data, generated continuously in real-time 
•Amazon Kinesis 
Data Warehousing 
Hadoop Style Processing 
Stream Processing
Real-time processingHigh throughput; elasticEasy to useS3, Redshift, DynamoDB Integrations 
Amazon Kinesis
Amazon Kinesis 
Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts
Hadoop/HDFS clustersHive, Impala, MapReduceEasy to use; fully managedOn-demand and spot pricing 
Amazon EMR
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
Warehouse (analytics, decisioning, optimization, archive) Bidder Data (wins) Site Events3rd Party Segments
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
Firehose(Kinesis) Decisioning& OptimizationReal-time AnalyticsArchiveS3Bidder Data (wins) Site Events3rd Party Segments
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
App (metadata) Data mart (Oracle/ Postgres) QuboleRedshiftHadoopScriptsAttributionBiddersBiddersBidders 
S3 
S3 
S3S3 
EMREMREMR 
Recurring partition jobs/process jobs 
Partners/clients/tools/ internal services 
Pixels 
PixelsPixelsRealtimeFirehoseNetezza
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
http://bit.ly/awsevals

More Related Content

(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014

  • 1. November 13, 2014 | Las Vegas, NV Eddie Fagin, VP Engineering, MediaMath Ian Hummel, Sr. Director Engineering, MediaMath Adi Krishnan, Sr. PM Amazon Kinesis
  • 5. Canonical Data Flow With Amazon Kinesis
  • 6. •Query Engine Approach •Pre-computations such as indices and dimensional views improve performance •Historical, structured data •Amazon Redshift •HIVE/SQL-on-Hadoop/ M-R/ Spark •Batch programs, or other abstractions breaking down into MR style computations •Historical, Semi-structured data •Amazon EMR •Custom computations of relative simple complexity •Continuous Processing –filters, sliding windows, aggregates –on infinite data streams •Semi/Structured data, generated continuously in real-time •Amazon Kinesis Data Warehousing Hadoop Style Processing Stream Processing
  • 7. Real-time processingHigh throughput; elasticEasy to useS3, Redshift, DynamoDB Integrations Amazon Kinesis
  • 8. Amazon Kinesis Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts
  • 9. Hadoop/HDFS clustersHive, Impala, MapReduceEasy to use; fully managedOn-demand and spot pricing Amazon EMR
  • 17. Warehouse (analytics, decisioning, optimization, archive) Bidder Data (wins) Site Events3rd Party Segments
  • 19. Firehose(Kinesis) Decisioning& OptimizationReal-time AnalyticsArchiveS3Bidder Data (wins) Site Events3rd Party Segments
  • 23. App (metadata) Data mart (Oracle/ Postgres) QuboleRedshiftHadoopScriptsAttributionBiddersBiddersBidders S3 S3 S3S3 EMREMREMR Recurring partition jobs/process jobs Partners/clients/tools/ internal services Pixels PixelsPixelsRealtimeFirehoseNetezza