Apache Kafka - Strakin Technologies Pvt Ltd

STRAKIN Copyright © 2019, strakin and/or its affiliates. All rights reserved
Problem
• An Organisation can have multiple servers at front-end like Web or Application server for hosting website or application, Chat
Server for the customer to provide chat facilities, a separate server for Payment etc.
• Organisation can also have multiple servers at the backend, which will be receiving messages from different servers based on
their requirements. They can have Security systems for user authentication & authorization, a real-time monitoring system,
data warehouse, etc.
• Now as you can see the data pipelines are getting complex with the
increase in a number of systems. So, adding a new system or sever
require more data pipelines which will again make the data flow more
complicated.
• Managing these data pipelines also becomes very difficult as each
data pipelines has its own set of requirements. Adding some pipelines
or removing some pipelines also becomes more difficult from this
complex system.
• LinkedIn (a social network specifically designed for career and business professionals to connect) has come across this problem.

LinkedIn - Use Case
• Some of the LinkedIn features are,
• Page visits and clicks
• User activities
• Events corresponding to logins
• Social networking activities such as likes, shares, and comments
• Application-specific metrics (e.g. logs, page load time, performance etc.)
• This data can be used to run analytics in real time serving various purposes, some of which are :
• Delivering advertisements
• Tracking abnormal user behaviors
• Displaying search based on relevance
• Showing recommendations based on previous activities
Problem: Collecting all the data is not easy as data is generated from various sources in different formats
Solution: One of the ways to solve this problem is to use a messaging system. Messaging systems provide seamless integration
between distributed applications with the help of messages.

Solution - Apache Kafka
• Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and later on became a part of
the Apache project.
• The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
• Its storage layer is essentially a "massively scalable pub/sub message queue designed as a distributed transaction log, making
it highly valuable for enterprise infrastructures to process streaming data.
• Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a
Java stream processing library.

Kafka - Architecture
Kafka Cluster
Consumer
Producer Producer Producer
Consumer Consumer
Apache Zookeeper
Inside Kafka
Primary
Broker
Replicas
(Brokers)
Topic: A stream of messages belonging to a particular category is called a topic
Producer: A producer can be any application that can publish messages to a topic
Consumer: A consumer can be any application that subscribes to topics and consumes the messages
Broker: Kafka cluster is a set of servers, each of which is called a broker

Kafka - Inside Broker
Partition: Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a
particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers
to read from a topic in parallel
Leader: A single partition can be assigned to many brokers, only one broker at the time is considered as the owner of partition or
as a leader
Follower: Other brokers that have the same data (messages) as the leader, are called sync replicas or follower
Topic with
- 4 Partitions
- Replication
factor = 3
Partition 0
Follower
Partition 0
Leader
Partition 0
Follower
Partition 1
Leader
Partition 2
Follower
Partition 3
Follower
Partition 1
Follower
Partition 2
Follower
Partition 3
Leader
Partition 1
Follower
Partition 2
Leader
Partition 3
Follower
Broker 1 Broker 2 Broker 3

Kafka - Cluster Types
Kafka is scalable and allows creation of multiple types of clusters
Apache Zookeeper
Kafka
Broker
Producer
Producer
Producer
Consumer
Consumer
ConsumerSingle Node Single Broker
Apache Zookeeper
Producer
Producer
Producer
Consumer
Consumer
Consumer
Single Node Multiple Brokers
Broker 2
Broker 1
Broker 3
Apache Zookeeper
Multiple Nodes Multiple Brokers Node 2
Consumer
Consumer
Consumer
What’s the role of ZooKeeper ?
Each Kafka broker coordinates with
other Kafka brokers using
ZooKeeper. Producers and
Consumers are notified by the
ZooKeeper service about the
presence of new brokers or failure of
the broker in the Kafka system
Producer
Producer
Node 1Producer
Broker 1
Broker 2
Broker 1
Broker 2

Getting started with Kafka
Prerequisite
Components
Download Links
Java - https://www.oracle.com/technetwork/java/javase/downloads/index.html
Apache Zookeeper - https://zookeeper.apache.org/releases.html
Apache Kafka - https://kafka.apache.org/downloads

Kafka - Installation and setup
After successful download, add kafka to $PATH variable as follows,
1) Open bash_profile (In mac) by typing sudo vi ~/.bash_profile
2) Add the following lines in that file,
1) export KAFKA_HOME=/Users/jothibasu/Downloads/kafka_2.12-2.1.1 Your Folder Location
2) export PATH=$PATH://Users/jothibasu/Downloads/kafka_2.12-2.1.1/bin
3) Restart your terminal, after saving the document
Start the apache zookeeper server by following commands,
1) Type cd KAFKA_HOME in your terminal (Navigation to kafka home)
2) Start the server by,
./bin/zookeeper-server-start.sh config/zookeeper.properties
Open a new terminal to start apache kafka server by following commands,
2) Start the server by,
/bin/kafka-server-start.sh config/server.properties

Kafka - Create Topic & Publish Messages
Next, we’ll create topic to send messages,
2) ./bin/kafka-topics.sh - -create - -zookeeper <<ipaddress : port>> - -replication-factor <<n>> - -partitions <<n>> - -topic
<<topic-name>>
Eg : /bin/kafka-topics.sh - -create - -zookeeper localhost:2181 - -replication-factor 1 - -partitions 1 - -topic test-topic
3) To check topic is created list the topics by the following,
/bin/kafka-topics.sh - -list - -zookeeper localhost:2181
Next start a consumer by the following,
2) /bin/kafka-console-consumer.sh - -bootstrap-server localhost:9200 - -topic test-topic - -from-beginning
Next start a producer in new tab by the following,
2) /bin/kafka-console-producer.sh - -broker-list localhost:9200 - -topic test-topic
Note
1) ’N’ number of tabs - ’N’ number of consumers/producers
2) All the above - Only for testing purpose

Producer / Consumer - Java API
Get started by creating the maven project,
1) Add the following dependencies in your pom.xml
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.12</version>
</dependency>
2) Create separate classes for producer and consumer
1) Producer.java
2) Consumer.java

Producer API
package com.strakin.kafkalearning;
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.errors.AuthorizationException;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringSerializer;
public class Producer {
public static void main(String[] args) {
KafkaProperties propertiesObj = new KafkaProperties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "my-transactional-id");
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props, new StringSerializer(),
new StringSerializer());
producer.initTransactions();
try {
producer.beginTransaction();
for (int i = 0; i < 5; i++)
producer.send(new ProducerRecord<String, String>("test-topic", Integer.toString(i), Integer.toString(i)));
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the
// producer and exit.
producer.close();
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
producer.close();
}
}

Consumer API
package com.strakin.kafkalearning;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class Consumer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test-topic", “abc"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
}
Note
1) All the above - Only for testing purpose

Apache Kafka - Strakin Technologies Pvt Ltd

Related slideshows

More Related Content

Apache Kafka - Strakin Technologies Pvt Ltd