HEATH’S THEOREM

Consider relvar S once again, with its FD {CITY} → {STATUS}. Suppose we decompose that relvar, not as in Chapter 3 into relvars SNC and CT, but instead into relvars SNT and CT—where CT is the same as before, but SNT has heading {SNO,SNAME,STATUS} instead of {SNO,SNAME,CITY}. Sample values for SNT and CT corresponding to the value shown for S in Figure 1-1 are shown in Figure 5-1 below. From that figure, I hope you can see that:

  • Relvars SNT and CT are both in BCNF (the keys are {SNO} and {CITY}, respectively, and the only nontrivial FDs that hold in those relvars are “arrows out of superkeys”).

  • Unlike the decomposition in Chapter 3, however, this decomposition is not nonloss but lossy. For example, we can’t tell from Figure 5-1 whether supplier S2 is in Paris or Athens—note what happens if we join the two projections together[48]—and so we’ve lost information.

Relvars SNT and CT—sample values

Figure 5-1. Relvars SNT and CT—sample values

Let’s take a slightly closer look at this example. First of all, here are the predicates for relvars SNT and CT:

  • SNC: Supplier SNO is named SNAME and has status STATUS.

  • CT: City CITY has status STATUS.

So the predicate for the join of those two relvars is:

Supplier SNO is named SNAME and has status STATUS and city CITY has status STATUS.

Now recall the predicate for relvar S (see the answer to Exercise 2.6 in Appendix D):

Supplier SNO is named SNAME and is located in city CITY, which has status STATUS.

This latter predicate is clearly not the same as the predicate for the join. To be more precise, if some given tuple t satisfies it, then that tuple t also satisfies the predicate for the join, but the converse isn’t true. That’s why the join “loses information” or “is lossy”—just because some tuple appears in the join, we can’t assume it also appears in the original relvar S.

So what exactly is it that makes some decompositions nonloss and others lossy? This is the question that lies at the heart of normalization theory. It can be stated formally thus:

Let r be a relation and let r1, ..., rn be projections of r. What conditions must be satisfied in order for r to be equal to the join of those projections? (By the way, note the tacit assumption here that—as noted earlier—join is an n-adic operator.)

An important, albeit partial, answer to this question was provided by Ian Heath in 1971 when he proved the following theorem:

  • Heath’s Theorem (for relations): Let relation r have heading H and let X, Y, and Z be subsets of H such that the union of X, Y, and Z is equal to H. Let XY denote the union of X and Y, and similarly for XZ. If r satisfies the FD XY, then r is equal to the join of its projections on XY and XZ.

By way of example, consider the suppliers relation once again (i.e., the current value of relvar S as shown in Figure 1-1). That relation satisfies the FD {CITY} → {STATUS}. Thus, taking X as {CITY}, Y as {STATUS}, and Z as {SNO,SNAME}, Heath’s Theorem tells us that the decomposition of that relation into its projections on {CITY,STATUS} and {CITY,SNO,SNAME}[49] is nonloss—as indeed we already know.

Now, it’s important to understand that (to repeat) Heath’s answer to the original question was only partial. I’ll explain what this means in terms of the foregoing example. Basically, the theorem does tell us the decomposition into projections SNC and CT (see Figure 3-2 in Chapter 3) is nonloss; however, it doesn’t tell us the one into SNT and CT (see Figure 5-1) is lossy. In other words, if we decompose on the basis of an FD, as we did in the example of Figure 3-2, then Heath’s Theorem says the decomposition will be nonloss; but if we decompose on some other basis, as we did in the example of Figure 5-1, then the theorem has nothing to say on the matter. Thus, the theorem gives a sufficient condition, but not a necessary one, for a given (binary) decomposition to be nonloss. It follows that it might be possible to decompose relation r in a nonloss way into its projections on XY and XZ even if it doesn’t satisfy the FD XY. Note: I’ll be describing a stronger form of Heath’s Theorem, giving both necessary and sufficient conditions, later in this book (see Chapter 12).

As an aside, I remark that in the paper in which he proved his theorem, Heath also gave a definition of what he called “third” normal form that was in fact a definition of BCNF. Since that definition preceded Boyce and Codd’s own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn’t.

Now, in Chapter 3, in the section NORMALIZATION SERVES TWO PURPOSES, I said something like the following:

If you’ve been paying careful attention, you might reasonably accuse me of practicing a tiny deception in the foregoing discussion. To be specific, I’ve considered what it means for a decomposition of relations to be nonloss; but normalization, which is what we’re supposed to be talking about, isn’t a matter of decomposing relations, it’s a matter of decomposing relvars.

These remarks apply here too! So let’s get back to relvars ... Consider relvar S once again. Suppose we do decide to perform the recommended decomposition into “projection” relvars SNC and CT; moreover, suppose we want that decomposition to be nonloss, as indeed we surely do. In other words, what we want is for the decomposition to be such that, at all times, the current value of relvar S is equal to the join of the current values of SNC and CT.[50] That is, we want S to be subject to the following integrity constraint (I’ll call it YCT):

     CONSTRAINT YCT
        S = JOIN { S { SNO , SNAME , CITY } , S { CITY , STATUS } } ;

Now, recall from Chapter 2 that S is certainly subject to the following constraint (XCT):

     CONSTRAINT XCT
        COUNT ( S { CITY } ) = COUNT ( S { CITY , STATUS } ) ;

Just to remind you, this constraint merely says the FD {CITY} → {STATUS} holds in S. Appealing to Heath’s Theorem, therefore, we see that every possible value of relvar S, since it necessarily satisfies constraint XCT, certainly satisfies constraint YCT as well. And it follows that constraint XCT implies constraint YCT (meaning, to spell the point out, that if relvar S is subject to XCT—which it is—then it’s certainly subject to YCT as well). So constraint YCT does hold, and the decomposition of relvar S into relvars SNC and CT is indeed nonloss, as required. It follows that we can take Heath’s Theorem as applying to relvars after all, not just to relations. So let’s restate it accordingly:

  • Heath’s Theorem (for relvars): Let relvar R have heading H and let X, Y, and Z be subsets of H such that the union of X, Y, and Z is equal to H. Let XY denote the union of X and Y, and similarly for XZ. If R is subject to the FD XY, then R can be nonloss decomposed into its projections on XY and XZ.

There’s one further point I want to make on the general topic of nonloss decomposition (to BCNF or otherwise). Once again consider relvar S, with its FD {CITY} → {STATUS}. By Heath’s Theorem, that relvar can be nonloss decomposed into its projections on {SNO,SNAME,CITY} and {CITY,STATUS}. However, it can clearly also be nonloss decomposed into those two projections together with (say) the projection on {SNAME,STATUS}; that is, if we join all three of those projections together, we get back to where we started. (Check this claim for yourself, using our usual sample value for relvar S, if it isn’t immediately obvious.) However, that third projection clearly isn’t needed in the process of reconstructing the original relvar. Now, when we’re doing database design, for obvious reasons we usually consider only decompositions for which every projection is needed in the reconstruction process—but in this book I’m discussing decompositions in general, and I won’t limit myself to those for which every projection is needed (barring explicit statements to the contrary, of course).



[48] See the remarks on lossy joins in a footnote in the section NORMALIZATION SERVES TWO PURPOSES in Chapter 3.

[49] Or, as we would “more naturally” tend to write them, interchanging the two sets of attributes and specifying the individual attributes in a “more natural” order, on {SNO,SNAME,CITY} and {CITY,STATUS}.

[50] Here I’m adopting the convenient fiction again that relvars S, SNC, and CT all coexist (living alongside one another, as it were).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset