Version Vectors

In my last article, we talked about Vector Clocks. We learned that Vector Clocks could help establish the ordering of operations, including identifying whether operations happened concurrently or were causally related. A very similar mechanism is used by Version Vector, but it’s used for a slightly different purpose.

Version Vector

Version Vectors are generally leveraged in distributed data-driven applications, where each data record is tied to a Version Vector. Since it’s a distributed system, a data record can be updated concurrently by multiple nodes. Thus, we can leverage version vectors, to identify if a data record can be reconciled immediately or if it requires a conflict resolution(remember that we can identify concurrent updates, & if an update to a data record is concurrent, it needs a conflict resolution).

Just like Vector Clocks, Version Vectors also maintain a vector per node, but the entries don’t represent the logical times anymore.

Before going further, let’s specify the criteria for identifying a concurrent update. Similar to Vector clocks, we compare two Version Vectors to understand the relationship between them -

  • A version vector is considered higher than the other if both of the version vectors have version number for the same nodes and each version number is higher than the one in the other vector and vice versa.
  • If the version vectors cover different nodes or if not all the version numbers are higher, they are considered…

--

--

Pratik Pandey - https://pratikpandey.substack.com

Senior Engineer with experience in designing and architecting large scale distributed systems.