Chapter 8. Denormalization

What’s normal, anyway?

Anon.: Where Bugs Go

I want to say a few words about denormalization. Now, I haven’t considered, so far in this book, any level of normalization higher than BCNF (at least, not in detail). But denormalization, if it means anything at all, can’t apply just to BCNF specifically; I mean, it can’t refer just to dropping back to some level of normalization that’s lower than BCNF specifically. Rather, it has to mean dropping back from any given level of normalization to some lower one.

That said, however, I need to say too that relvars that are in BCNF and not in some higher normal form are comparatively unusual (though not completely unknown, I hasten to add). In practice, therefore, denormalization does usually refer quite specifically to dropping back to some level of normalization below BCNF; hence the inclusion of this chapter in this part of the book.

“DENORMALIZE FOR PERFORMANCE”?

Ever since SQL products first came on the market, the claim that it’s necessary to “denormalize for performance” has been widely promulgated. The (specious!) supporting argument goes something like this:

  1. Normalization means lots of relvars.

  2. Lots of relvars means lots of stored files.

  3. Lots of stored files means lots of I/O.

In the case of suppliers and parts, for example, a request to get details for suppliers who supply red parts involves two dyadic joins—suppliers to shipments first, perhaps, and then the result of that join to parts. And if the three relvars correspond to three physically separate stored files, then those two joins will require lots of I/O and will therefore perform badly.

As already noted, this argument is specious, at least in principle. The reason is that the relational model nowhere stipulates that relvars must map one for one to stored files. In the case of suppliers and parts, for example, there’s no logical reason why we couldn’t physically store the join of the three relvars—possibly even redundantly—as one single stored file on the disk,[71] which could reduce the amount of I/O significantly for the query under consideration. The point is irrelevant for present purposes, however, because:

  • First, this area is one in which most DBMS vendors have seriously let us down; most SQL products do indeed map relvars one for one to stored files, pretty much.[72] Even the exceptions fail to provide us with as much data independence as we might like, or as much as relational systems are theoretically capable of. As a practical matter, therefore, that “specious” argument is, sadly, valid for most SQL products today.

  • Second, even if relvars didn’t map one for one to stored files, denormalization might still be desirable at the stored file level. Indeed, a major reason why mappings that aren’t one for one would be desirable is precisely that they would permit denormalization to be done at the physical level, where it belongs, without it having to show through to—and thereby corrupt—the logical level.

So I’ll assume for the sake of discussion that denormalization does sometimes have to be done, at some level or other. But what is denormalization?



[71] I’m speaking pretty loosely here, of course. In particular, I’m ignoring the possibility that there might be some suppliers or some parts with no corresponding shipments.

[72] I realize the mapping from relvars to stored files isn’t always exactly one to one as I’m suggesting here—for example, some products allow several relvars to share the same stored file, and some allow a single relvar to span several stored files. But these facts don’t significantly affect the bigger picture, and I ignore them here for simplicity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset