SlideShare a Scribd company logo
Kappa vs. Lambda Architecture
Use Cases, Trade-offs, Technologies, Comparison
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
An Event Streaming Platform
The Underpinning of Data in Motion
2
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
STREAM
PROCESSING
CONNECTORS
Example Architecture for Data in Motion
ksqlDB
KStreams
Real-time decision making for claim processing and fraud detection
Dashboard
Oracle
DB
Oracle
CDC
CONNECTOR
Salesforce CDC
CONNECTOR
Salesforce
Source / Sink
CONNECTOR
Fraud Detection App
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kafka Connect
Kafka Cluster
CRM Integration
Domain-Driven Design for your Integration Layer
Legacy
Integration
Custom
Application
ESB Connector
Java / Python /
ksqlDB / etc.
Schema Registry
Event Streaming Platform
CRM Domain Legacy Domain Payment Domain
è Independent and loosely coupled, but scalable, highly available and reliable!
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Lambda Architecture
Option 1: Unified serving layer
7
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Serving
Layer
Real-Time App
(Data Processing in Motion)
Batch App
(Data Processing at Rest)
ms
min/hr
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
8
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Real-time Query
Mixed Query
ms
min/hr
Speed
View
Batch
View
Batch Query
Lambda Architecture
Option 2: Separate serving layers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Concerns with the Lambda Architecture
9
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
10
Data
Source
Real-Time Layer
(Data Processing in Motion)
Real-Time App
(Data Processing in Motion)
Storage
Batch App
(Data Processing at Rest)
Storage
ms
min/hr
Storage
Kappa Architecture
One pipeline for real-time and batch consumers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa is NOT a free lunch
11
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa Concerns Solved
• Data availability / retention
à Compacted Topics, Tiered Storage
• Data consistency and fault-tolerance
à Exactly-once semantics, Multi-Region Clusters, Cluster Linking
• Handling late-arriving data
à State management in the streaming application, proper data
sinks, replay with guaranteed ordering and timestamps
• Data reprocessing and backfill
à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB,
external stream processing framework like Apache Flink)
• Data integration
à Kafka Connect for sources and sinks, clients for any language,
REST Proxy (real-time but also batch and RPC
12
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Uber
13
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Shopify
14
Kappa Building Blocks
The Log (Kafka)
Durability with Topic Compaction and Tiered Storage
Consistency via Exactly-Once Semantics (EOS)
Data Integration via Kafka Connect
Elasticity via dynamic Kafka clusters
Streaming Framework (Kafka Streams / Flink)
Reliability and scalability
Fault tolerance
State management
Sinks
Update/Upsert for simplified design:
RDBMS, NoSQL, Compacted Kafka Topics
Append-only: Regular Kafka Topics, Time Series
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Disney
15
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Twitter
17
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter-
Migration from Hadoop and Kafka to a hybrid architecture on both Twitter data
center and Google Cloud Platform with Kafka and GCP, Twitter is able to process
billions of events in real-time and achieve low latency, high accuracy, stability,
architecture simplicity, and reduced operation cost
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Benefits of the Kappa Architecture
The Kappa architecture leverages a single source of truth with a focus on simplicity in
the enterprise architecture
• Improve streaming to handle all the cases
• One codebase that is always in synch
• One set of infrastructure and technology
• The heart of the infrastructure is real-time, scalable, and reliable
• Improved data quality with guaranteed ordering and no mismatches
• No need to re-architect for new use cases, just connect new consumers (real-time, near
real-time, batch, RPC)
18
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Store Data
Long-Term
in Kafka?
Kafka
Processing
App
Storage
Transactions, auth,
quota enforcement,
compaction, ...
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical Data
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Tiered Storage @ Uber
23
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Confluent Tiered Storage for Kafka
24
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
honeycomb - Observability
• Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO
• Ingest telemetry data
• Buffer big data before processing in “retriever” columnar storage database
• True decoupling to innovate more quickly by shipping to each service
• Guard against the risk of a bug in retriever corrupting customer data
• Confluent Tiered Storage frees the engineering from being storage-bound
• Has grown 10x in two years while TCO for Kafka has only gone up 20%
• Replayability from Tiered Storage after outage for error handling
25
https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa Architecture
for Streaming Analytics with Kafka and TensorFlow
26
MQTT Proxy
MongoDB
Storage
MongoDB
Dashboards
Search
Analytics
Kafka Cluster Kafka Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential Detect
TensorFlow
Train Analytic
Model
ksqlDB
Analytic
Model
Preprocess Data Consume
Data
Deploy
Analytic Model
Tiered Storage
Mobile App
BI Tool
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model B
Model A
Producer
Distributed Commit
Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27
Model X
(at a later time)
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB and TensorFlow
28
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Car Engine Car Self-driving Car
Alternatives for Data in Motion
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Native Kafka Kafka Protocol
(not fully compliant)
Non Kafka
The Event Streaming Landscape – Cloud-native? Complete? Everywhere?
Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies
Self Managed
(Everywhere)
Partially
Managed
Fully Managed
(Cloud only)
(Cloud
only)
(Everywhere)
(Kafka mapper not
part of cloud offering)
Platforms Tools
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kai Waehner
Field CTO
kai.waehner@confluent.io
@KaiWaehner
confluent.io
kai-waehner.de
linkedin.com/in/kaiwaehner
Questions? Feedback?
Let’s connect!

More Related Content

Kappa vs Lambda Architectures and Technology Comparison

  • 1. Kappa vs. Lambda Architecture Use Cases, Trade-offs, Technologies, Comparison Kai Waehner Field CTO kai.waehner@confluent.io linkedin.com/in/kaiwaehner @KaiWaehner confluent.io kai-waehner.de
  • 2. An Event Streaming Platform The Underpinning of Data in Motion 2 Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 3. STREAM PROCESSING CONNECTORS Example Architecture for Data in Motion ksqlDB KStreams Real-time decision making for claim processing and fraud detection Dashboard Oracle DB Oracle CDC CONNECTOR Salesforce CDC CONNECTOR Salesforce Source / Sink CONNECTOR Fraud Detection App kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 4. Kafka Connect Kafka Cluster CRM Integration Domain-Driven Design for your Integration Layer Legacy Integration Custom Application ESB Connector Java / Python / ksqlDB / etc. Schema Registry Event Streaming Platform CRM Domain Legacy Domain Payment Domain è Independent and loosely coupled, but scalable, highly available and reliable! kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 5. Lambda Architecture Option 1: Unified serving layer 7 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Serving Layer Real-Time App (Data Processing in Motion) Batch App (Data Processing at Rest) ms min/hr kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 6. 8 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Real-time Query Mixed Query ms min/hr Speed View Batch View Batch Query Lambda Architecture Option 2: Separate serving layers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 7. Concerns with the Lambda Architecture 9 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 8. 10 Data Source Real-Time Layer (Data Processing in Motion) Real-Time App (Data Processing in Motion) Storage Batch App (Data Processing at Rest) Storage ms min/hr Storage Kappa Architecture One pipeline for real-time and batch consumers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 9. Kappa is NOT a free lunch 11 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 10. Kappa Concerns Solved • Data availability / retention à Compacted Topics, Tiered Storage • Data consistency and fault-tolerance à Exactly-once semantics, Multi-Region Clusters, Cluster Linking • Handling late-arriving data à State management in the streaming application, proper data sinks, replay with guaranteed ordering and timestamps • Data reprocessing and backfill à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB, external stream processing framework like Apache Flink) • Data integration à Kafka Connect for sources and sinks, clients for any language, REST Proxy (real-time but also batch and RPC 12 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 11. Kappa @ Uber 13 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 12. Kappa @ Shopify 14 Kappa Building Blocks The Log (Kafka) Durability with Topic Compaction and Tiered Storage Consistency via Exactly-Once Semantics (EOS) Data Integration via Kafka Connect Elasticity via dynamic Kafka clusters Streaming Framework (Kafka Streams / Flink) Reliability and scalability Fault tolerance State management Sinks Update/Upsert for simplified design: RDBMS, NoSQL, Compacted Kafka Topics Append-only: Regular Kafka Topics, Time Series kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 13. Kappa @ Disney 15 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 14. Kappa @ Twitter 17 https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter- Migration from Hadoop and Kafka to a hybrid architecture on both Twitter data center and Google Cloud Platform with Kafka and GCP, Twitter is able to process billions of events in real-time and achieve low latency, high accuracy, stability, architecture simplicity, and reduced operation cost kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 15. Benefits of the Kappa Architecture The Kappa architecture leverages a single source of truth with a focus on simplicity in the enterprise architecture • Improve streaming to handle all the cases • One codebase that is always in synch • One set of infrastructure and technology • The heart of the infrastructure is real-time, scalable, and reliable • Improved data quality with guaranteed ordering and no mismatches • No need to re-architect for new use cases, just connect new consumers (real-time, near real-time, batch, RPC) 18 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 16. Store Data Long-Term in Kafka? Kafka Processing App Storage Transactions, auth, quota enforcement, compaction, ... kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 17. Use Cases for Reprocessing Historical Events Give me all events from time A to time B Real-time Producer Time • New consumer application • Error-handling • Compliance / regulatory processing • Query and analyze existing events • Schema changes in analytics platform • Model training Real-time Consumer Consumer of Historical Data kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 18. Tiered Storage @ Uber 23 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 19. Confluent Tiered Storage for Kafka 24 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 20. honeycomb - Observability • Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO • Ingest telemetry data • Buffer big data before processing in “retriever” columnar storage database • True decoupling to innovate more quickly by shipping to each service • Guard against the risk of a bug in retriever corrupting customer data • Confluent Tiered Storage frees the engineering from being storage-bound • Has grown 10x in two years while TCO for Kafka has only gone up 20% • Replayability from Tiered Storage after outage for error handling 25 https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/ kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 21. Kappa Architecture for Streaming Analytics with Kafka and TensorFlow 26 MQTT Proxy MongoDB Storage MongoDB Dashboards Search Analytics Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect TensorFlow Train Analytic Model ksqlDB Analytic Model Preprocess Data Consume Data Deploy Analytic Model Tiered Storage Mobile App BI Tool kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 22. Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model B Model A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io 27 Model X (at a later time) kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 23. “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and TensorFlow 28 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 24. Car Engine Car Self-driving Car Alternatives for Data in Motion kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 25. Native Kafka Kafka Protocol (not fully compliant) Non Kafka The Event Streaming Landscape – Cloud-native? Complete? Everywhere? Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies Self Managed (Everywhere) Partially Managed Fully Managed (Cloud only) (Cloud only) (Everywhere) (Kafka mapper not part of cloud offering) Platforms Tools kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture