Chapter 6. Preserving FDs

Nature does require
Her times of preservation

William Shakespeare: Henry VIII

Once again consider our usual suppliers relvar S. Since {SNO} is a key, that relvar is certainly subject to the FD {SNO} → {STATUS}. Thus, taking X as {SNO}, Y as {STATUS}, and Z as {SNAME,CITY}, Heath’s Theorem tells us we can decompose that relvar into relvars SNC and ST, where SNC has heading {SNO,SNAME,CITY} and ST has heading {SNO,STATUS}. Sample values for SNC and ST corresponding to the value shown for S in Figure 1-1 are shown in Figure 6-1.

Relvars SNC and ST—sample values

Figure 6-1. Relvars SNC and ST—sample values

In this decomposition:

  • Relvars SNC and ST are both in BCNF—{SNO} is the key for both, and the only nontrivial FDs that hold in those relvars are “arrows out of superkeys.”

  • What’s more, the decomposition is certainly nonloss (as is in fact guaranteed by Heath’s Theorem)—if we join SNC and ST together, we get back to S.

  • However, the FD {CITY} → {STATUS} has been lost—by which I mean, of course, that it’s been replaced by a certain multirelvar constraint, as explained in the previous chapter.[51] The constraint in question can be stated as follows:

         CONSTRAINT ...
            COUNT ( ( JOIN { SNC , ST } ) { CITY } ) =
            COUNT ( ( JOIN { SNC , ST } ) { CITY , STATUS } ) ;

Explanation: What this constraint says is, if we join SNC and ST, we get a result—call it S—in which the number of distinct cities is equal to the number of distinct city / status pairs. And the fact that this latter property holds is equivalent to saying the FD {CITY} → {STATUS} holds in the original relvar S.

So we’ve “lost” an FD. What are the implications? Well, certainly the multirelvar constraint that replaces it is harder to state, as we’ve just seen. More to the point, perhaps, it’s harder to enforce (harder, that is, than it would have been with the preferred decomposition into projections SNC and CT, as illustrated in Figure 3-2 in Chapter 3).[52] For example, suppose we update relvar SNC to change the city for supplier S1 from London to Athens; then we must also update relvar ST to change the status for supplier S1 from 20 to 30—because if we don’t, then joining SNC and ST back together will produce a result that isn’t a legitimate value for relvar S. (By contrast, if we update relvar SNC to change the city for supplier S2 from Paris to Athens, then we don’t have to update relvar ST as well—but we still have to inspect relvar ST in order to determine that fact.)

Aside: It might be possible, given a well architected DBMS, to get the system to do that necessary inspection of relvar ST “automatically,” instead of the user having to do it. It might even be possible to get the system to perform any necessary additional updates “automatically,” too. Even given such a system, however, it’s still the case that the constraint is harder to enforce (i.e., more work still has to be done, even if it’s done by the system and not the user). In any case, such possibilities are just a pipedream at the time of writing—today’s commercial products typically don’t allow multirelvar constraints even to be stated, in general; the foregoing possibilities are out of reach today, and dealing with (and in particular enforcing) such constraints is thus the user’s responsibility. End of aside.

So the message is: Try to choose a decomposition that preserves FDs instead of losing them. (In the case at hand, replacing the projection on {SNO,STATUS} by that on {CITY,STATUS} solves the problem.) Loosely speaking, in other words, if the FD XY holds in the original relvar, try not to choose a decomposition in which X winds up in one relvar and Y in another. Note: Of course, I’m assuming here that the decomposition isn’t being done on the basis of that FD XY itself—because if it is, we’ll effectively wind up with two X’s, one of which will be in the same relvar as Y (necessarily so) and the other won’t. I’m also assuming, tacitly, that XY is part of what’s called an irreducible cover for the total set of FDs that hold in the original relvar. I’ll be discussing irreducible covers later in this chapter.

AN UNFORTUNATE CONFLICT

The basic idea of FD preservation is straightforward; unfortunately, however, there’s quite a bit more that needs to be said on the subject. First of all, I want to present what some people might regard as a pathological example. We’re given a relvar SJT with attributes S (student), J (subject), and T (teacher), and predicate Student S is taught subject J by teacher T. The following business rules apply:[53]

  • For each subject, each student of that subject is taught by only one teacher.

  • Each teacher teaches only one subject.

  • Each student studies several subjects, and hence is taught by several teachers (in general).

  • Each subject is studied by several students (in general).

  • Each subject is taught by several teachers (in general).

  • Distinct students of the same subject might or might not be taught that subject by the same teacher.

A sample value for this relvar that conforms to these rules is shown in Figure 6-2.

Sample value for relvar SJT

Figure 6-2. Sample value for relvar SJT

What are the FDs for relvar SJT? From the first business rule, we have {S,J} → {T}. From the second, we have {T} → {J}. A careful analysis of the remaining rules will show that no other FDs hold other than ones that are either trivial or reducible (or both). Thus, the only nontrivial, irreducible FDs that hold are these two:

     { S , J } → { T }
     { T }     → { J }

So what are the keys? Well, {S,J} is a key, since the entire heading is clearly functionally dependent on {S,J} and not on any proper subset of {S,J}. Also, {S,T} is a key, because:

  1. It’s certainly the case, given that the FD {T} → {J} holds, that the entire heading is functionally dependent on {S,T}.

  2. It’s also the case, given that the FDs {S} → {J} and {T} → {S} do not hold, that the entire heading isn’t functionally dependent on any proper subset of {S,T}.

So there are two keys, {S,J} and {S,T}.[54] Perhaps more to the point, {T} is not a key, and so relvar SJT is subject to an FD that’s not “an arrow out of a key” (i.e., it’s not implied by keys, to state the matter a trifle more formally). As a consequence, the relvar isn’t in BCNF, though it is in 3NF. (Exercise: Check this claim.) And it suffers from redundancy; for example, given the sample value shown in Figure 6-2, the fact that Professor White teaches Math appears twice. As you would expect, it also suffers from update anomalies; for example, with respect to Figure 6-2 again, we can’t delete the fact that Jones is studying Physics without losing the information that Professor Brown teaches Physics.

Now, we can get over these problems by decomposing the relvar appropriately. Applying Heath’s Theorem to the FD {T} → {J} (take X, Y, and Z to be {T}, {J}, and {S}, respectively), we obtain the following nonloss decomposition:

     TJ { T , J }
        KEY { T }

     TS { T , S }
        KEY { T , S }

I’ll leave it as an exercise to show the values of these two relvars corresponding to the value of SJT shown in Figure 6-2, to show they’re in BCNF, and to check that the decomposition does in fact avoid the redundancy and update anomalies mentioned above. Observe in particular that the FD {T} → {J} becomes a key constraint in this decomposition; in the original design, by contrast, it had to be stated and enforced separately.

There’s another problem, though. The fact is, although the decomposition into TJ and TS does avoid certain anomalies, it unfortunately introduces others. To be specific, the FD

     { S , J } → { T }

is lost (certainly it isn’t implied by the FD {T} → {J}, which is the only nontrivial FD to hold in the result of the decomposition). As a consequence, relvars TJ and TS can’t be independently updated. For example, an attempt to insert the tuple

     ( Smith , Prof. Brown )

into TS must be rejected, because Professor Brown teaches Physics and Smith is already being taught Physics by Professor Green; yet this fact can’t be detected without inspecting TJ.

To sum up, what the foregoing example illustrates is as follows: There are two objectives we typically aim for in nonloss decomposition, BCNF projections and FD preservation, and, sadly, these objectives can be in conflict with one another (i.e., it isn’t always possible to achieve both).

Now, at this point in an earlier draft of this chapter, I wrote the following:

So which objective do we give up on? Well, I’d tell you if I could, but I can’t. What the SJT example demonstrates is that the theory of normalization, important though it is, isn’t enough as it stands; I mean, there are questions it doesn’t answer. So the message is: We need more science! Normalization theory is certainly scientific, but it doesn’t solve all design problems.

At the prompting of one of my reviewers, however, I’ve come to the conclusion that this paragraph is probably overstated. It’s not so much more science we need here, it’s better implementations! That is, the main argument for tolerating a less than properly normalized design, in cases like the one at hand, is the fact that today’s DBMSs make it quite awkward to deal with multirelvar constraints like the “lost” FD in the example. So let me set a stake in the ground and state categorically that in my opinion, FD preservation is the objective to give up on, in those cases in which there’s a conflict.[55]



[51] To say an FD is “lost” in such circumstances is usual but a trifle inappropriate—all that’s happened is (to repeat) that the FD in question has been replaced by another constraint. But the point is, that other constraint isn’t an FD as such.

[52] There might be performance penalties, too. Now, I shouldn’t really mention this fact; as I indicated in Chapter 1, I never want performance considerations to be the driving force behind my logical design. But in the case at hand, performance is just an additional point that happens to reinforce my main argument.

[53] A business rule is a statement, usually in natural language, that’s supposed to capture some aspect of what the data in the database means or how it’s constrained. There’s no consensus on any more precise definition of the term, though most writers would at least agree that relvar predicates are an important special case.

[54] Which overlap, as you can see. By the way, as with relvar SNP in Chapter 4, I’ve chosen not to make either of those keys primary, which is why there’s no double underlining in Figure 6-2.

[55] In the case at hand, of course, we must decompose the relvar as indicated if we want to be able to record the fact that (say) Professor Black teaches physics even though Professor Black has no students at the moment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset