A Short Presentation on Kafka

Abstract
Rayapati Praveen
&
Mostafa Jubayer Khan

Contents
● Definitions
● History
● Kafka Architecture
● Capabilities & Core API
● Advantages
● Limitations
● Usage
● References
● Future challenges

Apache Kafka® is a distributed streaming platform.
. A stream is a pipeline to which your applications receives data continuously.What exactly does that mean?
It is an open source distributed streaming platform that simplifies data integration between systems
Created and open sourced by LinkedIn in 2011.Written in Scala & Java.
Kafka has quickly evolved from messaging queue to a full-fledged streaming platform
A streaming platform has three key capabilities:
● Publish & subscribe to streams of records, similar to a message queue or enterprise messaging system.
● Store streams of records in a fault-tolerant durable way.
● Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:
● Data Integration: Building real-time streaming data pipelines that reliably get data between systems or applications
● Stream Processing: Building real-time streaming applications that transform or react to the streams of data

Architecture, Capabilities & Core API
Kafka system has three main components:
A Producer: The service that emits the source data.
A Broker: acts as an intermediary between the producer and the consumer.
It uses the power of API's to get and broadcast data.
A Consumer: The service that uses the data which the broker will broadcast.
Kafka, in general,
Run as a cluster on one or more servers that can span multiple datacenters.
Stores streams of records in categories called topics,
Each record consists of a key, a value, and a timestamp.
Kafka has four core Application Programming Interface (API), are
The Producer API to publish a stream of records to one or more Kafka topics.
The Consumer API to subscribe to one or more topics and processes the streams
The Streams API to act as a stream processor, transforming the input streams to output streams.
The Connector API to build and run reusable producers or consumers to import and export heavy data from the DB and others
systems

Advantages
● Used for complex and heavy load of data pipelines for data integration than other software e.g. Redis, RabbitMQ, AMQP,
Microsoft Azure bus etc.
● Create a series of validations, transformations
● Keep record of the information for later consumption called commit log
● Fault-tolerant, replayable, real-time & reliable to use
● Work with external stream processing systems e.g. Apache Apex, Apache Flink, Apache Spark, and Apache Storm.
Limitations
● It's NOT Plug & Play.
● Need to write bunch of codes for applications
● Expert usually don't prefer to use in terms of lower chunk of data streaming.
● Need to know configuration parameters to customize or tune Kafka behaviour as per the user requirements.
● Problematic for older versus newer version of Kafka in terms of data streaming.
Users :
Apple Inc.,Netflix, Walmart, Cisco Systems,
eBay, PayPal, The New York Times etc.

References
1. http://kafka.apache.org/intro
2. https://www.youtube.com/watch?v=udnX21__SuU&t=57s
3. https://www.youtube.com/watch?v=dq-ZACSt_gA
4. https://en.wikipedia.org/wiki/Apache_Kafka
5. https://scotch.io/tutorials/build-a-distributed-streaming-system-with-apache-kafka-and-pythons
Any Query?

A Short Presentation on Kafka

Related slideshows

More Related Content

A Short Presentation on Kafka

Editor's Notes