1

Suppose I have a cluster with 3 kafka brokers. I set:

min.insync.replicas=2
default.replication.factor=3
  • All brokers are up, ISR is fine, I get a message where ack=all. Since ISR=2, two copies of the message are for sure stored. 1) Will one more copy (because replication=3) be made in the background? 2) If it fails - it does not matter, correct? Cluster health is just fine.

  • One broker is down, ISR=2 can be maintained and the message is saved to two brokers. After some time that broker that was down comes up again. 3) Since replication=3, will it try to catch up with the others in the back-ground?

I am trying to figure out of a practical example where setting replication factor to be bigger than ISR would make sense. A real example I could "touch" and understand. If this is a duplicate, please refer me to it. Thank you.

2 Answers 2

2

Yes, one replica is made in the background.

Yes, the broker will catch up all out of sync replicas upon restarts.

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers more than the difference between the values (due to maintenance or failure). Therefore, replication factor should always be greater

4
  • shouldn't in-sync replicas <= replication factor be in reverse?
    – Eugene
    Commented Dec 26, 2022 at 18:08
  • No. I said, if it's setup that way, you cannot lose brokers. Also replication factor can never be less than replicas Commented Dec 27, 2022 at 20:31
  • The verbiage here is miss-leading, imo. "If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers". You are using <=, which is "less or equal" in pure math, which means you are saying "If ISR < RF, you cannot lose any brokers", only to later say : "Therefore, replication factor should always be greater", which is already the exact same condition outlined in the previous statement: "ISR < RF". So maybe you meant = instead of <=?
    – Eugene
    Commented Dec 28, 2022 at 6:12
  • No. I mean less than or equal to. You don't need to explain what I typed. I edited to clarify my statement Commented Dec 28, 2022 at 7:34
1

The other answer is absolutely correct, but it took me quite a while to figure out. imho, this is somehow subtle and though my understanding might be a little incorrect here and there, it helped to build a mental model of what is going on.

Suppose I have a cluster of 3 brokers :

[a, b, c]  ->  brokers
[a, b]     -> ISR
[a, b, c]  -> RF

How many brokers can I tolerate to be down? The answer is 1.

  • If lose broker "c", ISR can still be satisfied and the cluster will work just fine.

  • If I lose broker "a" (the explanation is the same if I lost "b"), a rebalance has to happen. zookeeper will ask what brokers were in-sync (who satisfied RF) before I lost one from the ISR. Well, there were 3 of them part of RF = a, b, c. Since I lost "a", there are two left now that are in sync: "b" and "c". A leader election has to happen and the ISR will be satisfied with "b" and "c".

  • This means that I can lose any one broker from the cluster and still work fine. It might be trivial here, but the next example is not so much, imho.


Suppose I have a (artificial example) cluster with 5 brokers:

[a, b, c, d, e]  -> brokers
[a, b]           -> ISR
[a, b, c]        -> RF

How many brokers can I tolerate as being down now? Initially I thought 2, but that can't be correct.

  • If I lose "d" and "e", it's simple, the cluster will continue to work just fine.

  • If lose "a" and "b", in theory a rebalance has to happen. But what brokers were part of RF before I lost "a" and "b" or which brokers were in-sync? [a, b, c]. There is no way to satisfy ISR if two of those brokers are down.

  • This means that I can't tolerate any two brokers being down, which means this set-up is not really fault tolerant with any 2 brokers down.

  • It can only be tolerant with two brokers down if my set-up is different:

    5 -> brokers
    3 -> ISR
    5 -> RF
    

And this is where the other answer is correct and makes total sense:

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers more than the difference between the values

Not the answer you're looking for? Browse other questions tagged or ask your own question.