Home:ALL Converter>Why Kafka is not P in CAP theorem

Why Kafka is not P in CAP theorem

Ask Time:2018-07-17T15:06:04         Author:Jack

Json Formatter

The main developer of Kafka said Kafka is CA but P in CAP theorem. But I'm so confused, is Kafka not Partition tolerate? I think it does, when one replication is down the other would become leader and continue work!

Also, I would like to know what if Kafka uses P? Would P hurt C or A?

Author:Jack,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/51375187/why-kafka-is-not-p-in-cap-theorem
Alexander Abramov :

If you read how CAP defines C, A and P, \"CA but not P\" just means that when an arbitrary network partition happens, each Kafka topic-partition will either stop serving requests (lose A), or lose some data (lose C), or both, depending on its settings and partition's specifics.\n\nIf a network partition splits all ISRs from Zookeeper, with default configuration unclean.leader.election.enable = false, no replicas can be elected as a leader (lose A).\n\nIf at least one ISR can connect, it will be elected, so it can still serve requests (preserve A). But with default min.insync.replicas = 1 an ISR can lag behind the leader by approximately replica.lag.time.max.ms = 10000. So by electing it Kafka potentially throws away writes confirmed to producers by the ex-leader (lose C).\n\nKafka can preserve both A and C for some limited partitions. E.g. you have min.insync.replicas = 2 and replication.factor = 3, and all 3 replicas are in-sync when a network partition happens, and it splits off at most 1 ISR (either a single-node failures, or a single-DC failure or a single cross-DC link failure).\n\nTo preserve C for arbitrary partitions, you have to set min.insync.replicas = replication.factor. This way, no matter which ISR is elected, it is guaranteed to have the latest data. But at the same time it won't be able to serve write requests until the partition heals (lose A).",
2019-03-24T10:09:42
Giorgos Myrianthous :

CAP Theorem states that any distributed system can provide at most two out of the three guarantees: Consistency, Availability and Partition tolerance. \n\nAccording to the Engineers at LinkedIn (where Kafka was initially founded) Kafka is a CA system: \n\n\n All distributed systems must make trade-offs between guaranteeing\n consistency, availability, and partition tolerance (CAP Theorem). Our\n goal was to support replication in a Kafka cluster within a single\n datacenter, where network partitioning is rare, so our design focuses\n on maintaining highly available and strongly consistent replicas.\n Strong consistency means that all replicas are byte-to-byte identical,\n which simplifies the job of an application developer.\n\n\nHowever, I would say that it depends on your configuration and more precisely on the variables acks, min.insync.replicas and replication.factor. According to the docs, \n\n\n If a topic is configured with only two replicas and one fails (i.e.,\n only one in sync replica remains), then writes that specify acks=all\n will succeed. However, these writes could be lost if the remaining\n replica also fails. Although this ensures maximum availability of the\n partition, this behavior may be undesirable to some users who prefer\n durability over availability. Therefore, we provide two topic-level\n configurations that can be used to prefer message durability over\n availability:\n \n \n Disable unclean leader election - if all replicas become unavailable, then the partition will remain unavailable until the most\n recent leader becomes available again. This effectively prefers\n unavailability over the risk of message loss. See the previous section\n on Unclean Leader Election for clarification.\n Specify a minimum ISR size - the partition will only accept writes if the size of the ISR is above a certain minimum, in order to prevent\n the loss of messages that were written to just a single replica, which\n subsequently becomes unavailable. This setting only takes effect if\n the producer uses acks=all and guarantees that the message will be\n acknowledged by at least this many in-sync replicas. This setting\n offers a trade-off between consistency and availability. A higher\n setting for minimum ISR size guarantees better consistency since the\n message is guaranteed to be written to more replicas which reduces the\n probability that it will be lost. However, it reduces availability\n since the partition will be unavailable for writes if the number of\n in-sync replicas drops below the minimum threshold.\n \n",
2018-07-17T10:26:41
yy