Centralized Coordination— Distributed Design Patterns

The beauty of distributed systems is that you can scale horizontally by adding more compute/resources to your system. But as the cluster grows larger, if we have tasks that need Strong Consistency/Linearizability guarantees like Leader Election etc, these guarantees become more expensive to achieve, directly impacting the throughput of your cluster.

Why is it more expensive to achieve Linearizability guarantees for a large cluster?

To achieve Linearizability guarantees, we need to have some Consensus within our cluster for the operations performed. If we assume a Quorum Based Consensus, for a 100-node cluster, we need acknowledgement from at-least 51 nodes, for any operation to be considered successful. This would have a direct impact on the throughput of our operation since it needs to wait for acknowledgement from 51 nodes before it can be considered successful.

How can we scale out our cluster without impacting Linearizability guarantees?

The solution is to outsource the operations which require Linearizability guarantees to a small cluster of 3–5 nodes. This cluster will have a leader and followers and the client can connect to this cluster for getting some consistent state/performing Linearizable operations. We can then use Quorum-based consensus on this cluster, and…

--

--

Pratik Pandey - https://pratikpandey.substack.com

Senior Engineer with experience in designing and architecting large scale distributed systems.