Member-only story
Bloom Filters: Probabilistic Data Structures
A Bloom Filter is a data structure designed to quickly and efficiently check whether an element is a member of a set. However, as the title suggests, it’s a probabilistic data structure, meaning it can tell you with certainty if an element is not in a set, but only with a certain probability if it is in the set. This characteristic leads to a possibility of false positives but guarantees no false negatives.
How it works
Bloom filters make use of an array of bit values, and they try to map the presence of a particular entry in that array by checking if the value of the bit is set or not.
How do we set the bit?
We pass the value through a hash function and set the bit corresponding to the hash value returned.
How is it different from a HashMap?
The major difference from a hashmap is that instead of using one hash function to map the input to a bit in the array, we pass the input through multiple hash functions and set each bit based on the output of the independent hash functions.
In HashMaps, collisions can lead to us being able to infer if an element actually exists, unless we compare the values. However, in Bloom filters, we just store bits as values, and hence we have no way of knowing if the bit being set…