SlideShare a Scribd company logo
www.clairvoyantsoft.com
Building Zero Data loss pipelines
with Apache Kafka
By: Avinash Ramineni
Quick Poll
| 2
| 3
About
Background Awards & Recognition
Boutique consulting firm centered on building data solutions and
products
All things Web and Data Engineering, Analytics, ML and User
Experience to bring it all together
Support core Hadoop platform, data engineering pipelines and provide
administrative and devops expertise focused on Hadoop
| 4
● Introduction and terminologies
● How can we lose data ?
● Zero Data Loss Pipelines
○ Producer
○ Kafka Cluster
○ Consumer
● Monitoring for Data Loss
● Summary
Agenda

Recommended for you

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Flink Forward San Francisco 2022. Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink. by Nico Kruber

apache flinkstream processingbig data
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer

Badai Aqrandista, Confluent, Senior Technical Support Engineer This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way! https://www.meetup.com/apache-kafka-sydney/events/279651982/

apache kafka
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...

Active-Active, Active-Passive, and stretch clusters are hallmark patterns that have been the gold standard in Apache Kafka® disaster recovery architectures for years. Moving to Kubernetes requires unpacking these patterns and choosing a configuration that allows you to meet the same RTO and RPO requirements. In this talk, we will cover how Active-Active/Active-Passive modes for disaster recovery have worked in the past and how the architecture evolves with deploying Apache Kafka on Kubernetes. We'll also look at how stretch clusters sitting on this architecture give a disaster recovery solution that's built-in! Armed with this information, you will be able to architect your new Apache Kafka Kubernetes deployment (or retool your existing one) to achieve the resilience you require.

apache kafkakafka summitconfluent
| 5
● An open source distributed stream processing platform
○ High Throughput
○ Scalable
○ Low Latency
○ Real-time
○ Fault Tolerant
● Messages are organized as topics
● Producers push messages
● Consumers pull messages
● Kafka doesn't have message acknowledgments, it assumes the consumer keeps tracks of what's been consumed
so far.
Kafka - Basics
| 6
● Each topic has multiple partitions
● Unit of parallelism
● IDs are unique for a partition for a topic
● Makes sure partitions within the same topic are roughly the same size
● Too many partitions ?.
Kafka - Partitions
| 7
Consuming from a Kafka Topic
Kafka Reads - https://fizalihsan.github.io/technology/kafka-partition-consumer.png
| 8
● Replication factor
○ Unit of replication is partition
○ Producer ACk
● A partition is owned by a single broker in the cluster, and that broker is called the leader for the partition.
● Producers and Consumers are only served by Leaders
● A Zookeeper cluster is called an “ensemble”.
● Spreads leader partitions evenly on brokers throughout the cluster
Kafka - Cluster

Recommended for you

A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka

Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.

apache kafkapub-submessaging
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka

Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.

producerperformancekafka
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive



3 Things to Learn About:
 -How Kudu is able to fill the analytic gap between HDFS and Apache HBase -The trade-offs between real-time transactional access and fast analytic performance -How Kudu provides an option to achieve fast scans and random access from a single API

apache kuduossapache hbase
| 9
Kafka - Cluster
| 10
● Any Kafka client (a producer or consumer) communicates only with the leader partition for data
○ All other partitions exist for redundancy and failover.
○ Follower partitions are responsible for copying new records from their leader partitions.
○ Follower partitions have an exact copy of the contents of the leader. Such partitions are called in-sync replicas (ISR).
Kafka - ISR (In Sync Replica)
| 11
● Failures Happen
● Systems need to be designed to tolerate failure
● Expect failures and design systems to handle them
Distributed Systems
| 12
● Producer
○ Acks =0, 1, all
○ Batch size, linger time
○ Broker is down
○ Block.on.buffer.full = false
● Kafka Cluster
○ Disk writes/flush are asynchronous
■ Hardware crashes
○ Metadata corruption
○ Kafka Bugs
● Consumer
○ Offset management
■ Offsets are committed before processing the messages completely
How can we lose data ?

Recommended for you

Kafka 101
Kafka 101Kafka 101
Kafka 101

Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics. Producers write data to topics and consumers read from topics. The data is partitioned and replicated across clusters of machines called brokers for reliability and scalability. A common data format like Avro can be used to serialize the data.

kafkajobteaserbeginner
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator

Flink Forward San Francisco 2022. The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples." by Thomas Weise

stream processingbig dataapache flink
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming apps. It was developed by LinkedIn in 2011 to solve problems with data integration and processing. Kafka uses a publish-subscribe messaging model and is designed to be fast, scalable, and durable. It allows both streaming and storage of data and acts as a central data backbone for large organizations.

apache kafkascalajava
| 13
● At-least once delivery semantics
● Out-of-order delivery semantics
● Kafka Cluster
● Each component in the pipeline makes sure that they are not loosing the messages
○ Producer
■ Takes the responsibility of making sure that messages are successfully delivered to Kafka Brokers
○ Kafka Cluster
■ Reliable and Resilient
○ Consumers
■ Takes the responsibility for consuming all messages from Kafka Cluster with the understanding
● Messages can be delivered more than once and have idempotency logic
● Operates with understanding of retention policy of the brokers, number of partitions for the
topics, etc.
● Manages check pointing and committing offsets
Zero Data loss pipelines
| 14
● Handle cases where broker is unavailable / down / network
issue
○ Local Spooling
○ Low Latency store
○ Backup cluster
● Ack = all ?
○ Consistency Vs Availability
■ Min ISR < = R
● (ex: replication factor = 3 min isr =2)
● Latency Vs Throughput
○ Batch size and linger time
● Standardardized on Message schema
○ Sequence Id, Timestamp, originating service
Producers
| 15
● Configure broker to make sure that no messages are lost at the broker end and can survive hardware failures using below
aspects
○ Partitions
■ Spread leader partitions evenly on brokers throughout the cluster
● Balances the load
■ Partition replication
● Replication Factor (R < = size of the cluster) (network congestion?)
● Sufficient replication sets to preserve data
● Rack awareness
○ Data Retention
■ Too much retention can be a problem
■ Leader election scenarios (unclean leader election)
○ Zookeeper availability
● DR / Backup
○ Mirror Maker to replicate the messages to another cluster
Kafka Cluster
| 16
● Failure Detection
○ Reassign replications on the dead brokers
○ Monitor for under replicated partitions
● Kafka Cluster sizing
○ Disk
○ Network
Kafka Cluster

Recommended for you

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams

Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.

stream processingkafka
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)

Speaker: Timo Walther (https://www.linkedin.com/in/twalthr/) Video: https://www.youtube.com/watch?v=Ych5bbmDIoA Talk presented during Bangalore Kafka group's stream processing meetup at Hotstar https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/265285812/

streamprocessingapache
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller

Presentation at Strata Data Conference 2018, New York The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure. Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.

apache kafkajun raostrata
| 17
Kafka Sizing
Recommended approach: Simulate load using load generation tools that ship with Kafka
Simplest: size based on disk-space (data ingest rate * data retention)
A little better: disk and network throughput requirements
Example: 1 TB/day —> 277MB/s. Replication 3 , Consumers 10
● Disk throughput = ingest * replication *2 (for lagging readers) = 277 * 3 * 2 = 1662 MB/s
● Network read throughput = ingest * (replication -1+ consumers)=277 *(3-1+10) = 3323 MB/s
● Network write throughput = ingest rate * replication = 277 * 3 = 831 MB /s
Network (e.g. 10 GBE): (3323+831)/1250 = 3.3 nodes
Disk (e.g. 6 drives & 70MB/s): 1662 /(6*70) = 3.95 nodes
Plan for double, so 8 kafka nodes
| 18
● Understand the data retention period and design accordingly (need to keep up with the producers)
○ Parallelism = number of partitions
■ Max consumers = partitions
○ Proper checkpointing and offset management
■ Set Autocommit disabled and commit offset after processing the message
■ Commit less often and Asynchronously
■ Checkpoint frequency
○ Expect at least once delivery of a message and have idempotency logic
Consumers
| 19
● QC Check Topic
○ Message sequence Id
○ Message counts per time bucket (10 mins? )
○ Reconcile on the number of messages produced and number of messages consumed numbers
● Kafka Audit System
○ Chaperone
■ https://github.com/uber/chaperone
○ Kafka Monitor
● Continuously monitor for issues
○ Monitor for producer errors -
■ number of retries and counts in retry database
○ Monitor Consumer Lag, fetch rate, fetch latency, records per request
○ Monitor for workload skews
● Capture Metrics
Monitor for Data Loss
| 20
● Guaranteeing zero data loss is not just Kafka’s problem
● Zero data loss pipelines require operations (Kafka cluster) and development teams (Producer /
consumer) working together
● Anticipate failures and design code to handle
● Gracefully Shutdown your application
Summary

Recommended for you

Apache kafka
Apache kafkaApache kafka
Apache kafka

This document provides an introduction to Apache Kafka, an open-source distributed event streaming platform. It discusses Kafka's history as a project originally developed by LinkedIn, its use cases like messaging, activity tracking and stream processing. It describes key Kafka concepts like topics, partitions, offsets, replicas, brokers and producers/consumers. It also gives examples of how companies like Netflix, Uber and LinkedIn use Kafka in their applications and provides a comparison to Apache Spark.

apache kafkastreamingmessage broker
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode

Flink Forward San Francisco 2022. Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo. by Robert Metzger

stream processingbig dataapache flink
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals

This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.

apache kafkacluster computingdistributed systems
Thank You!
| 21
Questions?
avinash@clairvoyantsoft.com

More Related Content

What's hot

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
Knoldus Inc.
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 

What's hot (20)

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 

Similar to Building zero data loss pipelines with apache kafka

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
kafka
kafkakafka
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
Mark Harrison
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
Stephen Hemminger
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
confluent
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
Gluster.org
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
Hotstar
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
Matteo Merli
 

Similar to Building zero data loss pipelines with apache kafka (20)

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
kafka
kafkakafka
kafka
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - Paytm
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 

More from Avinash Ramineni

Simplifying the data privacy governance quagmire building automated privacy ...
Simplifying the data privacy governance quagmire  building automated privacy ...Simplifying the data privacy governance quagmire  building automated privacy ...
Simplifying the data privacy governance quagmire building automated privacy ...
Avinash Ramineni
 
Winning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscapeWinning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscape
Avinash Ramineni
 
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Avinash Ramineni
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Avinash Ramineni
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Avinash Ramineni
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
Avinash Ramineni
 
Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash Ramineni
Avinash Ramineni
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
Avinash Ramineni
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
Avinash Ramineni
 

More from Avinash Ramineni (10)

Simplifying the data privacy governance quagmire building automated privacy ...
Simplifying the data privacy governance quagmire  building automated privacy ...Simplifying the data privacy governance quagmire  building automated privacy ...
Simplifying the data privacy governance quagmire building automated privacy ...
 
Winning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscapeWinning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscape
 
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
 
Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash Ramineni
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
 

Recently uploaded

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
taqyea
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
depikasharma
 
Cloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco ProductsCloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco Products
luqmansyauqi2
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
sapna sharmap11
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
KiranKumar139571
 
Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
taqyea
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
RajdeepPaul47
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
Milind Agarwal
 
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
butwhat24
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
simmi singh$A17
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
kumkum tuteja$A17
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
jiya khan$A17
 

Recently uploaded (20)

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
 
Cloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco ProductsCloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco Products
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
 
Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
 
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 

Building zero data loss pipelines with apache kafka

  • 1. www.clairvoyantsoft.com Building Zero Data loss pipelines with Apache Kafka By: Avinash Ramineni
  • 3. | 3 About Background Awards & Recognition Boutique consulting firm centered on building data solutions and products All things Web and Data Engineering, Analytics, ML and User Experience to bring it all together Support core Hadoop platform, data engineering pipelines and provide administrative and devops expertise focused on Hadoop
  • 4. | 4 ● Introduction and terminologies ● How can we lose data ? ● Zero Data Loss Pipelines ○ Producer ○ Kafka Cluster ○ Consumer ● Monitoring for Data Loss ● Summary Agenda
  • 5. | 5 ● An open source distributed stream processing platform ○ High Throughput ○ Scalable ○ Low Latency ○ Real-time ○ Fault Tolerant ● Messages are organized as topics ● Producers push messages ● Consumers pull messages ● Kafka doesn't have message acknowledgments, it assumes the consumer keeps tracks of what's been consumed so far. Kafka - Basics
  • 6. | 6 ● Each topic has multiple partitions ● Unit of parallelism ● IDs are unique for a partition for a topic ● Makes sure partitions within the same topic are roughly the same size ● Too many partitions ?. Kafka - Partitions
  • 7. | 7 Consuming from a Kafka Topic Kafka Reads - https://fizalihsan.github.io/technology/kafka-partition-consumer.png
  • 8. | 8 ● Replication factor ○ Unit of replication is partition ○ Producer ACk ● A partition is owned by a single broker in the cluster, and that broker is called the leader for the partition. ● Producers and Consumers are only served by Leaders ● A Zookeeper cluster is called an “ensemble”. ● Spreads leader partitions evenly on brokers throughout the cluster Kafka - Cluster
  • 9. | 9 Kafka - Cluster
  • 10. | 10 ● Any Kafka client (a producer or consumer) communicates only with the leader partition for data ○ All other partitions exist for redundancy and failover. ○ Follower partitions are responsible for copying new records from their leader partitions. ○ Follower partitions have an exact copy of the contents of the leader. Such partitions are called in-sync replicas (ISR). Kafka - ISR (In Sync Replica)
  • 11. | 11 ● Failures Happen ● Systems need to be designed to tolerate failure ● Expect failures and design systems to handle them Distributed Systems
  • 12. | 12 ● Producer ○ Acks =0, 1, all ○ Batch size, linger time ○ Broker is down ○ Block.on.buffer.full = false ● Kafka Cluster ○ Disk writes/flush are asynchronous ■ Hardware crashes ○ Metadata corruption ○ Kafka Bugs ● Consumer ○ Offset management ■ Offsets are committed before processing the messages completely How can we lose data ?
  • 13. | 13 ● At-least once delivery semantics ● Out-of-order delivery semantics ● Kafka Cluster ● Each component in the pipeline makes sure that they are not loosing the messages ○ Producer ■ Takes the responsibility of making sure that messages are successfully delivered to Kafka Brokers ○ Kafka Cluster ■ Reliable and Resilient ○ Consumers ■ Takes the responsibility for consuming all messages from Kafka Cluster with the understanding ● Messages can be delivered more than once and have idempotency logic ● Operates with understanding of retention policy of the brokers, number of partitions for the topics, etc. ● Manages check pointing and committing offsets Zero Data loss pipelines
  • 14. | 14 ● Handle cases where broker is unavailable / down / network issue ○ Local Spooling ○ Low Latency store ○ Backup cluster ● Ack = all ? ○ Consistency Vs Availability ■ Min ISR < = R ● (ex: replication factor = 3 min isr =2) ● Latency Vs Throughput ○ Batch size and linger time ● Standardardized on Message schema ○ Sequence Id, Timestamp, originating service Producers
  • 15. | 15 ● Configure broker to make sure that no messages are lost at the broker end and can survive hardware failures using below aspects ○ Partitions ■ Spread leader partitions evenly on brokers throughout the cluster ● Balances the load ■ Partition replication ● Replication Factor (R < = size of the cluster) (network congestion?) ● Sufficient replication sets to preserve data ● Rack awareness ○ Data Retention ■ Too much retention can be a problem ■ Leader election scenarios (unclean leader election) ○ Zookeeper availability ● DR / Backup ○ Mirror Maker to replicate the messages to another cluster Kafka Cluster
  • 16. | 16 ● Failure Detection ○ Reassign replications on the dead brokers ○ Monitor for under replicated partitions ● Kafka Cluster sizing ○ Disk ○ Network Kafka Cluster
  • 17. | 17 Kafka Sizing Recommended approach: Simulate load using load generation tools that ship with Kafka Simplest: size based on disk-space (data ingest rate * data retention) A little better: disk and network throughput requirements Example: 1 TB/day —> 277MB/s. Replication 3 , Consumers 10 ● Disk throughput = ingest * replication *2 (for lagging readers) = 277 * 3 * 2 = 1662 MB/s ● Network read throughput = ingest * (replication -1+ consumers)=277 *(3-1+10) = 3323 MB/s ● Network write throughput = ingest rate * replication = 277 * 3 = 831 MB /s Network (e.g. 10 GBE): (3323+831)/1250 = 3.3 nodes Disk (e.g. 6 drives & 70MB/s): 1662 /(6*70) = 3.95 nodes Plan for double, so 8 kafka nodes
  • 18. | 18 ● Understand the data retention period and design accordingly (need to keep up with the producers) ○ Parallelism = number of partitions ■ Max consumers = partitions ○ Proper checkpointing and offset management ■ Set Autocommit disabled and commit offset after processing the message ■ Commit less often and Asynchronously ■ Checkpoint frequency ○ Expect at least once delivery of a message and have idempotency logic Consumers
  • 19. | 19 ● QC Check Topic ○ Message sequence Id ○ Message counts per time bucket (10 mins? ) ○ Reconcile on the number of messages produced and number of messages consumed numbers ● Kafka Audit System ○ Chaperone ■ https://github.com/uber/chaperone ○ Kafka Monitor ● Continuously monitor for issues ○ Monitor for producer errors - ■ number of retries and counts in retry database ○ Monitor Consumer Lag, fetch rate, fetch latency, records per request ○ Monitor for workload skews ● Capture Metrics Monitor for Data Loss
  • 20. | 20 ● Guaranteeing zero data loss is not just Kafka’s problem ● Zero data loss pipelines require operations (Kafka cluster) and development teams (Producer / consumer) working together ● Anticipate failures and design code to handle ● Gracefully Shutdown your application Summary