5

Replication-factor is the total number of copies of the data stored in an Apache Kafka cluster.

min.insync.replicas is the minimum number of copies of the data that you are willing to have online at any time to continue running and accepting new incoming messages.

Suppose if I started a 5 node cluster and create a topic with replicator-factor of 3 with ack=all.

  1. Now when I publish a message will i get ack when data is replicated to other 3 broker every time ? what if 3 out of 5 nodes are down, will it wait to node come live again and then replicated the message and send the ack ? I believe min.insync.replica here is 1 by default ?
  2. Now if the min.insync.replica is set to 2 and replication factor is set to 3 then does this means that after replicating the data to 3 other node, ack is send back and the cluster will make sure that the the data will be present in atleast 2 nodes all the time. Is this understand on correct ?
  3. If the min.insync.replica is 2 and replication factor is 3. Will I get the ack after the data is replicated in 2 nodes and later the leader will add to 3 node or it will return the ack after the the data is replicated to 2 nodes ?

Basically I am interested in ack time and the durability of the data which is of highest priority so getting confused in some concepts.

1 Answer 1

4

I believe min.insync.replica here is 1 by default ?

Yes, that is the default

when I publish a message will i get ack when data is replicated to other 3 broker every time ?

To the leader and the "other 2", with acks=all, yes.

what if 3 out of 5 nodes are down

If those 3 nodes are all the replicas of the topic, then you'll get an error. You can optionally set/increase retries on the producer.

If the 2 remaining nodes include the leader or an ISR, then with min ISR set to 1, then the producer should continue. You'll just have one replica that is out of the ISR list.


In the remainder of your questions, acks=all, which has strong durability guarantees and will ensure the min ISR over all available replicas is met before the next message batch is written.

From Cloudera Documentation

How can I configure Kafka to ensure that events are stored reliably?

The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.

Producer

  • block.on.buffer.full=true
  • retries=Long.MAX_VALUE
  • acks=all
  • max.in.flight.requests.per.connections=1

Remember to close the producer when it is finished or when there is a long pause.

Broker

  • Topic replication.factor >= 3
  • Min.insync.replicas = 2
  • Disable unclean leader election

Consumer

  • Disable enable.auto.commit
  • Commit offsets after messages are processed by your consumer client(s).
2
  • I'm not sure if setting retries=Long.MAX_VALUE makes it "extremely difficult" for data loss, as depending on your producer's characteristics, and the incoming connection, you may be just backing up an ever increasing amount of incoming messages. Designing and operating reliable systems is hard, such simplistic advice isn't best practice. Just a word of advice for anyone reading this and wanting to take that setting as gospel.
    – scot
    Commented Jun 2, 2023 at 13:07
  • 1
    I didn't write that, as it comes from Cloudera. Hence the quote... I believe Kafka Definitive Guide book says something similar, but also enable.idempotence=true can be set (default now), and that has its own semantics Commented Jun 2, 2023 at 13:48

Not the answer you're looking for? Browse other questions tagged or ask your own question.