Member-only story
Quorum — Distributed Design Pattern

One of the constants with any Distributed System is Failure. We build systems such that they are resilient to failures. Let’s continue from our previous discussion on WAL, where to ensure durability, our operations were stored on the WAL, from where they could be recovered. Let’s assume we want to replicate our WAL to different nodes in the cluster for High Availability and Fault Tolerance. The next question we need to ask is -
How many nodes in our cluster need to acknowledge that they got the replicated copy from the original server before we can say that the update to the WAL was successful?
Quorum is the answer to the above question. Quorum is the minimum number of servers that must acknowledge a distributed operation to be successful before it's marked a success.
But why do we need quorum? What if we chose not to use Quorum?
Scenario 1: Replicate Changes To All Nodes In The Cluster
Instead of replicating the changes to a quorum of nodes, we can replicate the changes to all the nodes in the cluster. Then the origin server waits for acknowledgement from all the servers. This could lead to slow acknowledgement from the origin server(performance impact) & also could…