Member-only story
High Watermark — Replication Reconciliation
In my last article, we talked about High Watermark. We were able to achieve a consistent view despite failures/crashes in our highly available system. But, there was an edge case, where the system was becoming inconsistent.
The edge case comes when the current leader gets a few records, writes them to its WAL and then crashes before the latest records were replicated to the followers. Now, a new leader is elected, which doesn’t have those records. Now, the WAL of the old leader & new leader are not in sync & this is where we have the consistency issue.
Before going into the solution, let me talk about something that would really help with the solution.
Generation/Epoch
Each time the leader election happens, a new generation id(auto-incremented) is attached to the new leader. So let’s assume, our cluster just started, and the leader for each partition gets allocated generation as 1. All records written to the WAL, can also carry some metadata which can talk about the generation of the leader.
If the leader goes down and a new leader is elected after leader-election, the generation of the new leader will be 2, & all records written to the new leader will carry the metadata containing the new generation.
Back to the problem
Below is the current state, once the old leader comes back online. It will join back the cluster as a follower, as there is already a new leader.