Consistent Hashing: Architecture Pattern

Imagine you are building a shared distributed system that needs to store and retrieve data across multiple servers. One of the main challenges is deciding which server should be responsible for storing a given piece of data. Traditional approaches like modulo-based hashing often lead to inefficient data distribution and expensive rebalancing when the number of servers changes.

Problem with Modulo-Based Hashing

Let’s illustrate the problems with modulo-based hashing using a simple example of a distributed system with four servers (S1, S2, S3, and S4) and a dataset of five items (A, B, C, D, and E). We’ll use a modulo-based hashing technique to assign data items to servers based on their hash values.

Step 1: Calculate Hash Values Assuming our hash function assigns the following hash values to the data items:

A -> hash(A) = 11 B -> hash(B) = 20 C -> hash(C) = 32 D -> hash(D) = 45 E -> hash(E) = 50

Step 2: Assign Data Items to Servers Now, we distribute the data items to servers using the modulo-based hashing technique. Let’s say we use the modulo operator (%) to determine the server for each data item:

A -> S1 (hash(A) % 4 = 11 % 4 = 3) 
B -> S2 (hash(B) % 4 = 20 % 4 = 0)
C -> S2 (hash(C) % 4 = 32 % 4 = 0)
D -> S3 (hash(D) % 4 = 45 % 4 = 1)
E -> S4 (hash(E) % 4 = 50 % 4 = 2)

Data Distribution:

S1: A S2: B,C S3: D S4: E

--

--

Pratik Pandey - https://pratikpandey.substack.com

Senior Engineer with experience in designing and architecting large scale distributed systems.