High Watermark — Distributed Design Patterns

High Watermark allows you to see a consistent view of the data in a distributed system. Kafka leverages high watermark to ensure consumers do not see any data that doesn’t exist on a Quorum of nodes
High Watermark Ensures a Consistent View in a Distributed System

Prerequisites:

  1. WAL — Distributed Design Patterns
  2. Quorum — Distributed Design Patterns

Problem:

Before we go into what high watermarks are, and where they help, let’s look into the problem statement and why they were needed.

We talked about how WAL can help us recover our state from server crashes and restarts. But if you have a single server & it goes down, then WAL can’t help you with availability. This is where we resort to a distributed setup, where we keep a cluster of servers and under the leader-follower pattern, the leader replicates the log to a Quorum of its followers.

Stateful Distributed Systems generally ensure that they keep more than one copy of the data for fault tolerance & high availability. But maintaining multiple copies of the data, across multiple nodes, leads to challenges in maintaining strong consistency as -

  • The leader can fail before the entry has been synced to all followers.
  • The leader can fail after the entry has synced to some but not all followers.

--

--

Pratik Pandey - https://pratikpandey.substack.com

Senior Engineer with experience in designing and architecting large scale distributed systems.