Part 1 Hash-based sketches

In the next few chapters, we will explore probabilistic succinct data structures. We will see how bread-and-butter problems in the world of regular algorithms, such as frequency estimation, membership queries, and the count-distinct problem, become harder to tackle as the amount of data grows and classical data structures start to spill out of RAM. We turn our attention to a collection of data structures that help solve the same problems, only with much less space. What’s the catch? These data structures will not always give you 100% accuracy. The good news is that the error rates are often low and are greatly compensated for by major wins in data structure storage. The data structures exhibited in part 1 include Bloom filters, quotient filters, count-min sketch, HyperLogLog, and some compact variants of hash tables. These data structures are highly configurable to the desired error rate and are, in that sense, highly versatile. The next few chapters will be all about squeezing in the most functionality in the least amount of RAM space, and every bit will count. But first we begin with a review of hash tables and hashing, which serve as the building blocks of the many data structures to come.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset