Denormalization

We have already discussed how storage in columnar datastores doesn't fit into the traditional definitions of normalization. In traditional RDBMSs, minimizing redundancy is an important objective, which gave rise to the different normal forms. Normalization in traditional database design was largely driven by the need to save space, which in turn was driven by the monolithic nature of database servers. As distributed databases came along, the bandwidth became the bottleneck. Your normalized data could end up storing related data items in distant nodes. Even if you saved a few bytes, if you had to access the network three times instead of once, that would give terrible performance. Consequently, in the distributed world, disk seeks are expensive rather than storage, as we have a large number of generic machines, each with a lot of attached storage. What is really costly in a distributed filesystem is making lots of disk seeks to servers or to data that resides on different machines. This is why columnar data stores do away with the idea of normalization. Data is stored such that all the data for one entity resides together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset