Kafka rack-id and min in-sync replicas

Question

Kafka has introduced rack-id to provide redundancy capabilities if a whole rack fails. There is a min in-sync replica setting to specify the minimum number of replicas that need to be in-sync before a producer receives an ack (-1 / all config). There is an unclean leader election setting to specify whether a leader can be elected when it is not in-sync.

So, given the following scenario:

Two racks. Rack 1, 2.
Replication count is 4.
Min in-sync replicas = 2
Producer ack=-1 (all).
Unclean leader election = false

Aiming to have at least once message delivery, redundancy of nodes and tolerant to a rack failure.

Is it possible that there is a moment where the two in-sync replicas both come from rack 1, so the producer receives an ack and at that point rack 1 crashes (before any replicas from rack 2 are in-sync)? This means that rack 2 will only contain unclean replicas and no producers would be able to add messages to the partition essentially grinding to a halt. The replicas would be unclean so no new leader could be elected in any case.

Is my analysis correct, or is there something under the hood to ensure that the replicas forming min in-sync replicas have to be from different racks?
Since replicas on the same rack would have lower latency it seems that the above scenario is reasonably likely.

The scenario is shown in the image below:

Hans Jespersen · Accepted Answer · 2017-08-20 20:32:29Z

5

To be technically correct you should fix some of the questions wording. It is not possible to have out of sync replicas "available". Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes. When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2). It's still possible to construct a scenario similar to your question that highlight the same tradeoff problem by having 2 partitions in rack 2 out of sync first, then publish when the only 2 ISRs are in rack 1, and then take rack 1 down. In that case those partitions would be unavailable for read or write. So the easiest fix to this problem would be to increase min in-sync replicas to 3. Another less fault tolerant fix would be to reduce replication factor to 3.

edited Aug 20, 2017 at 20:32

answered Aug 20, 2017 at 15:40

Hans Jespersen

8,2251 gold badge25 silver badges31 bronze badges

1

Thanks, I will have a look at the wording in the question. However, I don't think your suggestions address the problem - as I understand it setting min in-sync replicas to 3 wouldn't work because when rack 1 fails, the maximum ISR that could be achieved is 2 and min ISR is 3, so producers would never receive ack. Also, replication factor = 2 wouldn't work in this case since it wouldn't tolerate either node to fail with min ISR = 2.
– acarlon
Commented Aug 20, 2017 at 19:58
Sorry I meant to say reduce replication factor to 3 (i will update my answer).
– Hans Jespersen
Commented Aug 20, 2017 at 20:22
The point I am trying to make is there is a trade off between write availability and message loss and you have to set parameters to either one or the other. If you favor write availability then configure to allow writes all into one rack (or AZ). If you favor no message loss then configure in a way that will generate an error if there are not enough available ISR across more than one rack (or AZ).
– Hans Jespersen
Commented Aug 20, 2017 at 20:25
Can you have a look at stackoverflow.com/questions/50689177/kafka-ack-all-and-min-isr as it seems the code contradicts the ack after in-sync statement, at least in one particular function..
– acarlon
Commented Jun 4, 2018 at 21:44
4

Just reconfirmed my answer with Jun (co-author of Apache Kafka). It is a common misconception that min.insync.replicas allows an ACK when only a minimal subset of the ISRs get the published message. However the “minimum” part applies to something else. The minimal value is defining how small the list of ISRs can get and still allow writes. ACKs are always returned when all ISR in the list get the message.Otherwise the leader election would be much more complicated because not all replicas would actually be in sync.This may change in the future to reduce latency but the answer is accurate today
– Hans Jespersen
Commented Nov 27, 2018 at 21:57

| Show 4 more comments

GuangshengZuo · Accepted Answer · 2017-08-20 16:26:58Z

Yes, I think It is possible. Because Kafka can only maintain the ISR according to the runtime's fact, not by its spirit.

words from https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka

for each partition of a topic, we maintain an in-sync replica set (ISR). This is the set of replicas that are alive and have fully caught up with the leader (note that the leader is always in ISR). When a partition is created initially, every replica is in the ISR. When a new message is published, the leader waits until it reaches all replicas in the ISR before committing the message. If a follower replica fails, it will be dropped out of the ISR and the leader then continues to commit new messages with fewer replicas in the ISR. Notice that now, the system is running in an under replicated mode.

Words from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication

After a configured timeout period, the leader will drop the failed follower from its ISR and writes will continue on the remaining replicas in ISR. If the failed follower comes back, it first truncates its log to the last checkpointed HW. It then starts to catch up all messages after its HW from the leader. When the follower fully catches up, the leader will add it back to the current ISR.

The min in-sync replicas you mentioned is just a limit number, the ISR size does not depend on it. this settings means if the producer's ack is "all" and the ISR size is less than min, then kafka will refuse to write this message.

So in the first time, the ISR is {1,2,3,4}, and if the broker 3 or 4 fall down, It will be kicked out from ISR. And the case you mentioned will happen. When rack 1's broker failed, this will be an unclean leader election.

Collectives™ on Stack Overflow

Kafka rack-id and min in-sync replicas

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
apache-kafka
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged apache-kafka or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
apache-kafka
or ask your own question.