Version Vectors

In my last article, we talked about Vector Clocks. We learned that Vector Clocks could help establish the ordering of operations, including identifying whether operations happened concurrently or were causally related. A very similar mechanism is used by Version Vector, but it’s used for a slightly different purpose.

Version Vector

Version Vectors are generally leveraged in distributed data-driven applications, where each data record is tied to a Version Vector. Since it’s a distributed system, a data record can be updated concurrently by multiple nodes. Thus, we can leverage version vectors, to identify if a data record can be reconciled immediately or if it requires a conflict resolution(remember that we can identify concurrent updates, & if an update to a data record is concurrent, it needs a conflict resolution).

Just like Vector Clocks, Version Vectors also maintain a vector per node, but the entries don’t represent the logical times anymore.

Before going further, let’s specify the criteria for identifying a concurrent update. Similar to Vector clocks, we compare two Version Vectors to understand the relationship between them -

  • A version vector is considered higher than the other if both of the version vectors have version number for the same nodes and each version number is higher than the one in the other vector and vice versa.
  • If the version vectors cover different nodes or if not all the version numbers are higher, they are considered concurrent.

Examples(A-D are actors, while the number represents event count just like in Vector Clocks)-

  • {A: 1, B: 1} is greater than {A: 1, B: 0}
  • {A: 2, B: 1} is concurrent with {A: 1, B: 2}
  • {A: 2, B: 1, C: 1} is greater than {A: 2, B: 1}
  • {A: 2, B: 1, C: 1} is concurrent with {A: 2, B: 1, D:1}

Now, it’s extremely critical to identify where things are different from Vector Clocks. Always remember that Version Vectors exist for each data item, & hence the comparison of vectors in above example is comparison of vectors on different Actors for the same data record/item.

Following are the steps followed by the algorithm -

  1. Each actor reads the initial state of version vector for each key & the value associated with the key. Let’s start with just 1 key for simplicity & the algo can easily be…
Pratik Pandey -

Senior Engineer with experience in designing and architecting large scale distributed systems.