Kafka min.insync.replicas < replication.factor

Question

Suppose I have a cluster with 3 kafka brokers. I set:

min.insync.replicas=2
default.replication.factor=3

All brokers are up, ISR is fine, I get a message where ack=all. Since ISR=2, two copies of the message are for sure stored. 1) Will one more copy (because replication=3) be made in the background? 2) If it fails - it does not matter, correct? Cluster health is just fine.
One broker is down, ISR=2 can be maintained and the message is saved to two brokers. After some time that broker that was down comes up again. 3) Since replication=3, will it try to catch up with the others in the back-ground?

I am trying to figure out of a practical example where setting replication factor to be bigger than ISR would make sense. A real example I could "touch" and understand. If this is a duplicate, please refer me to it. Thank you.

OneCricketeer · Accepted Answer · 2022-12-28 07:35:45Z

2

Yes, one replica is made in the background.

Yes, the broker will catch up all out of sync replicas upon restarts.

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers more than the difference between the values (due to maintenance or failure). Therefore, replication factor should always be greater

edited Dec 28, 2022 at 7:35

answered Dec 26, 2022 at 17:38

OneCricketeer

188k20 gold badges139 silver badges258 bronze badges

shouldn't in-sync replicas <= replication factor be in reverse?
– Eugene
Commented Dec 26, 2022 at 18:08
No. I said, if it's setup that way, you cannot lose brokers. Also replication factor can never be less than replicas
– OneCricketeer
Commented Dec 27, 2022 at 20:31
The verbiage here is miss-leading, imo. "If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers". You are using <=, which is "less or equal" in pure math, which means you are saying "If ISR < RF, you cannot lose any brokers", only to later say : "Therefore, replication factor should always be greater", which is already the exact same condition outlined in the previous statement: "ISR < RF". So maybe you meant = instead of <=?
– Eugene
Commented Dec 28, 2022 at 6:12
No. I mean less than or equal to. You don't need to explain what I typed. I edited to clarify my statement
– OneCricketeer
Commented Dec 28, 2022 at 7:34

Add a comment |

Eugene · Accepted Answer · 2022-12-28 11:35:11Z

The other answer is absolutely correct, but it took me quite a while to figure out. imho, this is somehow subtle and though my understanding might be a little incorrect here and there, it helped to build a mental model of what is going on.

Suppose I have a cluster of 3 brokers :

[a, b, c]  ->  brokers
[a, b]     -> ISR
[a, b, c]  -> RF

How many brokers can I tolerate to be down? The answer is 1.

If lose broker "c", ISR can still be satisfied and the cluster will work just fine.
If I lose broker "a" (the explanation is the same if I lost "b"), a rebalance has to happen. zookeeper will ask what brokers were in-sync (who satisfied RF) before I lost one from the ISR. Well, there were 3 of them part of RF = a, b, c. Since I lost "a", there are two left now that are in sync: "b" and "c". A leader election has to happen and the ISR will be satisfied with "b" and "c".
This means that I can lose any one broker from the cluster and still work fine. It might be trivial here, but the next example is not so much, imho.

Suppose I have a (artificial example) cluster with 5 brokers:

[a, b, c, d, e]  -> brokers
[a, b]           -> ISR
[a, b, c]        -> RF

How many brokers can I tolerate as being down now? Initially I thought 2, but that can't be correct.

If I lose "d" and "e", it's simple, the cluster will continue to work just fine.
If lose "a" and "b", in theory a rebalance has to happen. But what brokers were part of RF before I lost "a" and "b" or which brokers were in-sync? [a, b, c]. There is no way to satisfy ISR if two of those brokers are down.
This means that I can't tolerate any two brokers being down, which means this set-up is not really fault tolerant with any 2 brokers down.
It can only be tolerant with two brokers down if my set-up is different:
```
5 -> brokers
3 -> ISR
5 -> RF
```

And this is where the other answer is correct and makes total sense:

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers more than the difference between the values

Collectives™ on Stack Overflow

Kafka min.insync.replicas < replication.factor

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
java
apache-kafka
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged javaapache-kafka or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
java
apache-kafka
or ask your own question.