Collecting and processing terabytes of data per day is a challenge for any technology company. As marketers and brands become more sophisticated consumers of data, enabling granular levels of access to targeted subsets of data from outside your firewalls presents new challenges. This session discusses how to build scalable, complex, and cost-effective data processing pipelines using Amazon Kinesis, Amazon EC2 Spot Instances, Amazon EMR, and Amazon Simple Storage Service (S3). Learn how MediaMath revolutionized their data delivery platform with the help of these services to empower product teams, partners, and clients. As a result, a number of innovative products and services are delivered on top of terabytes of online user behavior. MediaMath covers their journey from legacy batch processing and vendor lock-in to a new world where the raw materials to build advanced lookalike models, optimization algorithms, or marketing attribution models are readily available to any engineering team in real time, substantially reducing the time - and cost - of innovation.
Report
Share
Report
Share
1 of 30
Download to read offline
More Related Content
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014
1. November 13, 2014 | Las Vegas, NV
Eddie Fagin, VP Engineering, MediaMath
Ian Hummel, Sr. Director Engineering, MediaMath
Adi Krishnan, Sr. PM Amazon Kinesis
6. •Query Engine Approach
•Pre-computations such as indices and dimensional views improve performance
•Historical, structured data
•Amazon Redshift
•HIVE/SQL-on-Hadoop/ M-R/ Spark
•Batch programs, or other abstractions breaking down into MR style computations
•Historical, Semi-structured data
•Amazon EMR
•Custom computations of relative simple complexity
•Continuous Processing –filters, sliding windows, aggregates –on infinite data streams
•Semi/Structured data, generated continuously in real-time
•Amazon Kinesis
Data Warehousing
Hadoop Style Processing
Stream Processing
8. Amazon Kinesis
Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts