WHAT DENORMALIZATION ISN’T (I)

I’ve said that denormalization means increasing redundancy. But it doesn’t follow that increasing redundancy means denormalization! This is one of the traps the denormalization paper falls into; the design tactics it describes do increase redundancy (usually), but they’re not—with, as noted earlier, one sole exception—applications of denormalization per se. (In logic, if p implies q is true, it doesn’t follow that q implies p is true, and to argue otherwise is a well known example of faulty reasoning: so well known, in fact, that it enjoys a special name, The Fallacy of False Conversion.)

Let’s examine a few of the examples from the denormalization paper. In one, we’re given relvars ITEM and SALES that look like this:

     ITEM  { INO , INAME }
           KEY { INO }

     SALES { SNO , INO , QTY }
           KEY { SNO , INO }
           FOREIGN KEY { INO } REFERENCES ITEM

The predicates are Item INO has name INAME and Quantity QTY of item INO were sold in store SNO, respectively. For performance reasons, the paper suggests adding a TOTAL_QTY attribute to the ITEM relvar, whose value for any given item is the total sales of that item taken over all stores. But although it’s true that the resulting design involves some redundancy, the fact remains that both relvars are still in BCNF (note in particular that the FD {INO} → {TOTAL_QTY} holds in the revised version of relvar ITEM). In other words, there’s no denormalization, as such, in this example.

A second example involves what the paper calls “an internal array”:

     EMP { ENO , JAN_PAY , FEB_PAY , ..., DEC_PAY }
         KEY { ENO }

The predicate is Employee ENO was paid an amount JAN_PAY in January, ..., and an amount DEC_PAY in December. And presumably, though the paper doesn’t say as much explicitly, this “tuple wise” design is meant to be contrasted with—and for performance reasons, possibly preferred to—the following “attribute wise” analog:

     EMP { ENO , MONTH , PAY }
         KEY { ENO , MONTH }

But both designs are in BCNF. Again, there’s no denormalization here; in fact, to get ahead of myself for a moment (see Chapter 15), I would say there’s no increase in redundancy, either. (On the other hand, the original “tuple wise” design is probably bad, as you’ll see if you consider the query “Get employees whose salary was less than 5K in at least one month, together with the months in question.”)

Yet another example involves splitting a RESELLERS relvar “horizontally” into two separate relvars, ACTIVE_RESELLERS and INACTIVE_RESELLERS. In other words, the original relvar is decomposed via restriction (not projection), and is reconstructed from the two restrictions via union (not join). So we’re clearly not talking about normalization in the classical sense here at all; a fortiori, therefore, we’re not talking about classical denormalization either.[77]

I’ll give one more example from the denormalization paper. This one starts with STORE and EMP relvars as follows:

     STORE { SNO , REGION , STATE , ... }
           KEY { SNO , REGION , STATE }

     EMP   { ENO , SNO , REGION , STATE , ... }
           KEY { ENO }
           FOREIGN KEY { SNO , REGION , STATE } REFERENCES STORE

The predicates are Store SNO is located in region REGION within state STATE and Employee ENO is employed at store SNO within region REGION within state STATE. The redundancies are obvious, and so the suggestion is to introduce a surrogate identifier for stores, SID say, and thereby modify the design as follows:

     STORE { SID , SNO , REGION , STATE , ... }
           KEY { SID }
           KEY { SNO , REGION , STATE }

     EMP   { ENO , SID , ... }
           KEY { ENO }
           FOREIGN KEY { SID } REFERENCES STORE

But this revised design not only involves no denormalization, it actually decreases redundancy![78]—because the association of a given SNO with a given REGION and STATE now appears just once, instead of once for every employee of the store in question. (To spell the point out, it’s obviously not denormalization because—among other things—the one thing surely everybody agrees on is that denormalization is supposed to increase redundancy.)

By the way, I’m aware this last example might give the impression that I think surrogates are a good idea. Sadly, however, they aren’t always a good idea. The fact is, surrogates, while they might solve some problems, can also introduce further problems of their own. See Exercise 8.3, also Chapter 15, for further discussion.

In closing this section, I’d like to make it very clear that the foregoing discussions are in no way intended as an attack on the denormalization paper or its author. Indeed, the following quote from that paper should make it clear that the author and I are really on the same side on the bigger issues:

[We should] stop criticizing the relational model and make a clear distinction between what’s SQL and what’s relational ... The two are totally different.

I couldn’t agree more with this position, nor with the implication that the only reason we have to worry about such matters as denormalizing at the logical level is because of failures on the part of today’s SQL products. As I’ve written elsewhere, in fact:[79] In an ideal system, we would never have to denormalize at all, at the logical level. Even in today’s systems, which are typically much less than ideal, I believe we should denormalize only as a last resort. That is, we should back off from a fully normalized design only if all other strategies for improving performance have failed, somehow, to meet requirements. (Of course, I’m going along here with the usual assumption that normalization has performance implications—as indeed it does, typically, in current SQL products.)



[77] It’s true that it might be possible to define a new kind of normalization, based on restriction and union instead of projection and join (I’ll have more to say about this possibility in Part IV of this book). And if we did, well, I suppose we’d have a new kind of denormalization on our hands also. But I’m pretty sure that such considerations aren’t what the denormalization paper was referring to with its RESELLERS example.

[78] Or does it? Again, see Chapter 15, which includes further discussion of the use of surrogates in particular.

[79] E.g., in my book An Introduction to Database Systems (8th edition, Addison-Wesley, 2004). For further discussion, see my book Go Faster! The TransRelationaltm Approach to DBMS Implementation (Ventus Publishing, 2002, 2011).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset