Solving hot-spotting

Recall that while talking about the row key in the four-dimensional data model, we mentioned that data is stored in sorted lexicographic order of the row key. This is similar to Cloud Spanner. Data is sharded based on those key values, so that data that has the same key value will be grouped together. This implies that performance will be really poor if all of the reads and writes end up being concentrated in some particular shards or some ranges of the key values. A classic example is if sequential key values are used. There are some fairly typical techniques to solve hot-spotting, one of which is field promotion.

Here, the idea is that you use a structured key that is arranged in a reverse URL order, like a Java package name, for instance. Thus, keys will have similar prefixes but they will have different endings. If the sequential scan is based on some subset of the key prefix, all of the related values will be picked in one go. Reverse URL order is a pretty standard way of arranging keys in HBase.

The other common way of avoiding hotspots is salting, which is the descriptive term for the practice of hashing the key value. A surprising feature of Bigtable, colloquially known as warming the cache, is the fact that Bigtable will tend to improve in performance over time. The reason for this is that Bigtable observes the read and write patterns in your data and then redistributes the data in smart ways so that those reads and writes are evenly distributed over all of the shards of the distributed partitions. Bigtable is more proactive about moving data around in order to eliminate hotspots. An important implication of this is that if you are testing the performance of your Bigtable system, you need the test to last for several hours in order to get a true sense of the performance. If you run an inordinately short test of maybe half an hour or less, it wouldn't give Bigtable enough time to carry out all of the smart data movements to eliminate a hotspot and you will get a misleadingly poor indication of performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset