SlideShare a Scribd company logo
Real time stream processing using
Apache Kafka
Agenda
● What is Apache Kafka?
● Why do we need stream processing?
● Stream processing using Apache Kafka
● Kafka @ Hotstar
Feel free to stop me for questions
2
$ whoami
● Personalisation lead at Hotstar
● Led Data Infrastructure team at Grofers and TinyOwl
● Kafka fanboy
● Usually rant on twitter @jayeshsidhwani
3
What is Kafka?
4
● Kafka is a scalable,
fault-tolerant, distributed queue
● Producers and Consumers
● Uses
○ Asynchronous communication in
event-driven architectures
○ Message broadcast for database
replication
Diagram credits: http://kafka.apache.org
● Brokers
○ Heart of Kafka
○ Stores data
○ Data stored into topics
● Zookeeper
○ Manages cluster state information
○ Leader election
Inside Kafka
5
BROKER
ZOOKEEPER
BROKER BROKER
ZOOKEEPER
TOPIC
TOPIC
TOPIC
P P P
C C C
● Topics are partitioned
○ A partition is a append-only
commit-log file
○ Achieves horizontal scalability
● Messages written in a
partitions are ordered
● Each message gets an
auto-incrementing offset #
○ {“user_id”: 1, “term”: “GoT”} is
a message in the topic searched
Inside a topic
6Diagram credits: http://kafka.apache.org
How do consumers read?
● Consumer subscribes to a topic
● Consumers read from the head
of the queue
● Multiple consumers can read
from a single topic
7Diagram credits: http://kafka.apache.org
Kafka consumer scales horizontally
● Consumers can be grouped
● Consumer Groups
○ Horizontally scalable
○ Fault tolerant
○ Delivery guaranteed
8Diagram credits: http://kafka.apache.org
Stream processing and its use-cases
9
Discrete data processing models
10
APP APP APP
● Request / Response
processing mode
○ Processing time <1
second
○ Clients can use this
data
Discrete data processing models
11
APP APP APP
● Request / Response
processing mode
○ Processing time <1
second
○ Clients can use this
data
DWH HADOOP
● Batch processing
mode
○ Processing time few
hours to a day
○ Analysts can use this
data
Discrete data processing models
12
● As the system grows, such
synchronous processing
model leads to a spaghetti
and unmaintainable design
APP APP APPAPP
SEARCH
MONIT
CACHE
Promise of stream processing
13
● Untangle movement of data
○ Single source of truth
○ No duplicate writes
○ Anyone can consume anything
○ Decouples data generation
from data computation
APPAPP APP APP
SEARCH
MONIT
CACHE
STREAM PROCESSING FRAMEWORK
Promise of stream processing
14
● Untangle movement of data
○ Single source of truth
○ No duplicate writes
○ Anyone can consume anything
● Process, transform and react
on the data as it happens
○ Sub-second latencies
○ Anomaly detection on bad stream
quality
○ Timely notification to users who
dropped off in a live match Intelligence
APPAPP APP APP
STREAM PROCESSING FRAMEWORK
Filter
Window
Join
Anomaly Action
Stream processing using Kafka
15
Stream processing frameworks
● Write your own?
○ Windowing
○ State management
○ Fault tolerance
○ Scalability
● Use frameworks such as Apache Spark, Samza, Storm
○ Batteries attached
○ Cluster manager to coordinate resources
○ High memory / cpu footprint
16
Kafka Streams
● Kafka Streams is a simple, low-latency, framework
independent stream processing framework
● Simple DSL
● Same principles as Kafka consumer (minus operations
overhead)
● No cluster manager! yay!
17
Writing Kafka Streams
● Define a processing topology
○ Source nodes
○ Processor nodes
■ One or more
■ Filtering, windowing, joins etc
○ Sink nodes
● Compile it and run like any other java
application
18
Demo
Simple Kafka Stream
19
Kafka Streams architecture and operations
● Kafka manages
○ Parallelism
○ Fault tolerance
○ Ordering
○ State Management
20Diagram credits: http://confluent.io
Streaming joins and state-stores
● Beyond filtering and windowing
● Streaming joins are hard to scale
○ Kafka scales at 800k writes/sec*
○ How about your database?
● Solution: Cache a static stream
in-memory
○ Join with running stream
○ Stream<>table duality
● Kafka supports in-memory cache OOB
○ RocksDB
○ In-memory hash
○ Persistent / Transient
21Diagram credits: http://confluent.io
*
achieved using librdkafka c++ library
Demo
● Inputs:
○ Incoming stream of benchmark stream
quality from CDN provider
○ Incoming stream quality reported by
Hotstar clients
● Output:
○ Calculate the locations reporting bad
QoS in real-time
22Diagram credits: http://confluent.io
*
achieved using librdkafka c++ library
Demo
● Inputs:
○ Incoming stream of benchmark stream
quality from CDN provider
○ Incoming stream quality reported by
Hotstar clients
● Output:
○ Calculate the locations reporting bad
QoS in real-time
23Diagram credits: http://confluent.io
*
achieved using librdkafka c++ library
CDN
benchmarks
Client
reports
Alerts
KSQL - Kafka Streams ++
24
Kafka @ Hotstar
25
26
Stream <> Table duality
● Heart of Kafka Stream
● A stream is a changelog of
events
27Diagram credits: http://confluent.io
Stream <> Table duality
● Heart of Kafka Stream
● A stream is a changelog of
events
● A table is a compacted stream
28Diagram credits: http://confluent.io

More Related Content

What's hot

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connect
Loi Nguyen
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
confluent
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
datamantra
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
confluent
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
StreamNative
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
HostedbyConfluent
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
Shriya Arora
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko
 
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
HostedbyConfluent
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
confluent
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
C4Media
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 

What's hot (20)

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connect
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For Hadoop
 

Similar to Build real time stream processing applications using Apache Kafka

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
Hari Shreedharan
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Red Hat Developers
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
fschupp
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
Netty training
Netty trainingNetty training
Activity feeds (and more) at mate1
Activity feeds (and more) at mate1Activity feeds (and more) at mate1
Activity feeds (and more) at mate1
Hisham Mardam-Bey
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Upleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy Chen
HostedbyConfluent
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 

Similar to Build real time stream processing applications using Apache Kafka (20)

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Netty training
Netty trainingNetty training
Netty training
 
Activity feeds (and more) at mate1
Activity feeds (and more) at mate1Activity feeds (and more) at mate1
Activity feeds (and more) at mate1
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Upleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy Chen
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 

More from Hotstar

Airflow based Video Encoding Platform
Airflow based Video Encoding PlatformAirflow based Video Encoding Platform
Airflow based Video Encoding Platform
Hotstar
 
How Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at HotstarHow Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at Hotstar
Hotstar
 
WebSDK - Switching between service providers
WebSDK - Switching between service providersWebSDK - Switching between service providers
WebSDK - Switching between service providers
Hotstar
 
Remote Config and Beyond
Remote Config and BeyondRemote Config and Beyond
Remote Config and Beyond
Hotstar
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real time
Hotstar
 
Scaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrencyScaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrency
Hotstar
 
Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017 Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017
Hotstar
 

More from Hotstar (7)

Airflow based Video Encoding Platform
Airflow based Video Encoding PlatformAirflow based Video Encoding Platform
Airflow based Video Encoding Platform
 
How Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at HotstarHow Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at Hotstar
 
WebSDK - Switching between service providers
WebSDK - Switching between service providersWebSDK - Switching between service providers
WebSDK - Switching between service providers
 
Remote Config and Beyond
Remote Config and BeyondRemote Config and Beyond
Remote Config and Beyond
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real time
 
Scaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrencyScaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrency
 
Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017 Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017
 

Recently uploaded

Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 

Recently uploaded (20)

Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 

Build real time stream processing applications using Apache Kafka

  • 1. Real time stream processing using Apache Kafka
  • 2. Agenda ● What is Apache Kafka? ● Why do we need stream processing? ● Stream processing using Apache Kafka ● Kafka @ Hotstar Feel free to stop me for questions 2
  • 3. $ whoami ● Personalisation lead at Hotstar ● Led Data Infrastructure team at Grofers and TinyOwl ● Kafka fanboy ● Usually rant on twitter @jayeshsidhwani 3
  • 4. What is Kafka? 4 ● Kafka is a scalable, fault-tolerant, distributed queue ● Producers and Consumers ● Uses ○ Asynchronous communication in event-driven architectures ○ Message broadcast for database replication Diagram credits: http://kafka.apache.org
  • 5. ● Brokers ○ Heart of Kafka ○ Stores data ○ Data stored into topics ● Zookeeper ○ Manages cluster state information ○ Leader election Inside Kafka 5 BROKER ZOOKEEPER BROKER BROKER ZOOKEEPER TOPIC TOPIC TOPIC P P P C C C
  • 6. ● Topics are partitioned ○ A partition is a append-only commit-log file ○ Achieves horizontal scalability ● Messages written in a partitions are ordered ● Each message gets an auto-incrementing offset # ○ {“user_id”: 1, “term”: “GoT”} is a message in the topic searched Inside a topic 6Diagram credits: http://kafka.apache.org
  • 7. How do consumers read? ● Consumer subscribes to a topic ● Consumers read from the head of the queue ● Multiple consumers can read from a single topic 7Diagram credits: http://kafka.apache.org
  • 8. Kafka consumer scales horizontally ● Consumers can be grouped ● Consumer Groups ○ Horizontally scalable ○ Fault tolerant ○ Delivery guaranteed 8Diagram credits: http://kafka.apache.org
  • 9. Stream processing and its use-cases 9
  • 10. Discrete data processing models 10 APP APP APP ● Request / Response processing mode ○ Processing time <1 second ○ Clients can use this data
  • 11. Discrete data processing models 11 APP APP APP ● Request / Response processing mode ○ Processing time <1 second ○ Clients can use this data DWH HADOOP ● Batch processing mode ○ Processing time few hours to a day ○ Analysts can use this data
  • 12. Discrete data processing models 12 ● As the system grows, such synchronous processing model leads to a spaghetti and unmaintainable design APP APP APPAPP SEARCH MONIT CACHE
  • 13. Promise of stream processing 13 ● Untangle movement of data ○ Single source of truth ○ No duplicate writes ○ Anyone can consume anything ○ Decouples data generation from data computation APPAPP APP APP SEARCH MONIT CACHE STREAM PROCESSING FRAMEWORK
  • 14. Promise of stream processing 14 ● Untangle movement of data ○ Single source of truth ○ No duplicate writes ○ Anyone can consume anything ● Process, transform and react on the data as it happens ○ Sub-second latencies ○ Anomaly detection on bad stream quality ○ Timely notification to users who dropped off in a live match Intelligence APPAPP APP APP STREAM PROCESSING FRAMEWORK Filter Window Join Anomaly Action
  • 16. Stream processing frameworks ● Write your own? ○ Windowing ○ State management ○ Fault tolerance ○ Scalability ● Use frameworks such as Apache Spark, Samza, Storm ○ Batteries attached ○ Cluster manager to coordinate resources ○ High memory / cpu footprint 16
  • 17. Kafka Streams ● Kafka Streams is a simple, low-latency, framework independent stream processing framework ● Simple DSL ● Same principles as Kafka consumer (minus operations overhead) ● No cluster manager! yay! 17
  • 18. Writing Kafka Streams ● Define a processing topology ○ Source nodes ○ Processor nodes ■ One or more ■ Filtering, windowing, joins etc ○ Sink nodes ● Compile it and run like any other java application 18
  • 20. Kafka Streams architecture and operations ● Kafka manages ○ Parallelism ○ Fault tolerance ○ Ordering ○ State Management 20Diagram credits: http://confluent.io
  • 21. Streaming joins and state-stores ● Beyond filtering and windowing ● Streaming joins are hard to scale ○ Kafka scales at 800k writes/sec* ○ How about your database? ● Solution: Cache a static stream in-memory ○ Join with running stream ○ Stream<>table duality ● Kafka supports in-memory cache OOB ○ RocksDB ○ In-memory hash ○ Persistent / Transient 21Diagram credits: http://confluent.io * achieved using librdkafka c++ library
  • 22. Demo ● Inputs: ○ Incoming stream of benchmark stream quality from CDN provider ○ Incoming stream quality reported by Hotstar clients ● Output: ○ Calculate the locations reporting bad QoS in real-time 22Diagram credits: http://confluent.io * achieved using librdkafka c++ library
  • 23. Demo ● Inputs: ○ Incoming stream of benchmark stream quality from CDN provider ○ Incoming stream quality reported by Hotstar clients ● Output: ○ Calculate the locations reporting bad QoS in real-time 23Diagram credits: http://confluent.io * achieved using librdkafka c++ library CDN benchmarks Client reports Alerts
  • 24. KSQL - Kafka Streams ++ 24
  • 26. 26
  • 27. Stream <> Table duality ● Heart of Kafka Stream ● A stream is a changelog of events 27Diagram credits: http://confluent.io
  • 28. Stream <> Table duality ● Heart of Kafka Stream ● A stream is a changelog of events ● A table is a compacted stream 28Diagram credits: http://confluent.io