Version Vector(II)

In my last article, we saw how Distributed Data Stores use Version Vector to identify concurrent updates to data records. We looked at one of the techniques of identifying concurrent updates/conflicts by leveraging ClientId as an Actor & the advantages and disadvantages of doing so. In this article, we’ll look at another approach for identifying concurrent updates/conflicts.

Server As An Actor

The problem with Server as an Actor is that of Actor Explosion, as the number of clients can grow to a very high number. To solve that, we can leverage servers as actors.

But, you can ask, we can have very large clusters as well, across multiple regions and that might face the same problem of Actor Explosion.

Yes, You’re right! Hence, we define servers as the number of nodes defined by the replication factor. If you remember, Each data record is tied to Version Vectors & hence for each data record, the maximum size of the version vectors will be the replication factor for the data in that cluster.

Using Servers as Actors

Let’s try to understand what’s happening in the above diagram -

  1. Let’s assume we have a key K, with value U. We’re assuming that we have an empty version vector, to begin with. Client’s C2 and C3 sync the same state from the Replica(Assuming all clients are interacting with the same…

--

--

Pratik Pandey - https://pratikpandey.substack.com
Pratik Pandey - https://pratikpandey.substack.com

Written by Pratik Pandey - https://pratikpandey.substack.com

Senior Engineer with experience in designing and architecting large scale distributed systems.

No responses yet