REDUNDANCY FREE NORMAL FORM

In Chapter 9 I discussed what I there referred to as a surprising fact: namely, the fact that there exist relvars that can’t be nonloss decomposed into two projections but can be nonloss decomposed into more than two. Well, now I have another surprise for you: namely, that 5NF, though sufficient, isn’t actually necessary in order to eliminate the kind of redundancy we’ve been talking about throughout this book so far (i.e., the kind that can be eliminated by taking projections). Now, I hope you find this state of affairs as surprising as I did when I first encountered it, in 2010! 5NF was defined by Fagin in 1979, and for the next 30 years or so it was widely believed that a relvar had to be in 5NF in order to be redundancy free (meaning 5NF was certainly necessary, though not in general sufficient, in order to eliminate the kind of redundancy we’ve been talking about). It turns out, however, that 5NF isn’t absolutely necessary to achieve that goal after all; more specifically, it turns out that a new normal form, one that’s strictly weaker than 5NF and yet stronger than fourth normal form (4NF), is precisely as effective as 5NF at eliminating redundancy. In fact, that new normal form—which we call, for obvious reasons, redundancy free normal form (RFNF)—turns out to be both necessary and sufficient for the purpose. If the goal of normalization is to reduce redundancy, therefore, RFNF, not 5NF, is the target to be aimed for.

Aside: The name RFNF as used here is taken from a preliminary draft of a recent paper by Hugh Darwen, Ron Fagin, and myself (see Appendix G). As this book was going to press, however, we discovered that an earlier paper (“Redundancy Elimination and a New Normal Form for Relational Database Design,” by Millist W. Vincent, 1998) had already used that name to refer to something else (more specifically, something different from “our” normal form). For obvious reasons, therefore, we intend to choose a new name for our RFNF;[127] for present purposes, however, I’ll continue to use the name RFNF to refer to our normal form, barring explicit statements to the contrary. Apologies if this causes any confusion. I’ll have more to say about Vincent’s RFNF, as well as ours, in Appendix B. End of aside.

To illustrate these ideas, I’ll begin with an example: a modified form—I’ll call it SPJ′—of relvar SPJ from Chapter 9. As before, the relvar has attributes SNO (supplier number), PNO (part number), and JNO (project number), and the predicate too is the same as before: Supplier SNO supplies part PNO to project JNO. Also, the following business rule is in effect (again as before):

  1. If supplier s supplies part p and part p is supplied to project j and project j is supplied by supplier s, then supplier s supplies part p to project j.

However, suppose now that the following business rule is in effect as well:

  1. Any given supplier s supplies a given part p to at most one project j.

As we saw earlier, then, the following JD captures the essence of the first of these rules and therefore holds in SPJ′:

      { { SNO , PNO } , { PNO , JNO } , { JNO , SNO } }

Likewise, the following FD captures the essence of the second rule and therefore also holds in SPJ′:

     { SNO , PNO } → { JNO }

From this FD, it follows that {SNO,PNO} is a key for SPJ′. Furthermore, it can be shown that no other FDs or JDs hold, apart from trivial ones. Thus, since the foregoing JD certainly isn’t implied by the sole key—the membership algorithm fails—SPJ′ isn’t in 5NF, though it is in BCNF (just as SPJ wasn’t in 5NF but was in BCNF, and for essentially the same reasons).[128] Note: Just as an aside, I remark that the example thus gives the lie to two popular misconceptions (see Exercise 10.1): first, that a relvar with just one key and just one nonkey attribute must be in 5NF; second, that a BCNF relvar that’s not in 5NF must be all key.

Now suppose the relvar contains the following three tuples:

     t1  =  s1  p1  j2
     t2  =  s1  p2  j1
     t3  =  s2  p1  j1

(The symbols s1 and s2 denote supplier numbers; p1 and p2 denote part numbers; j1 and j2 denote project numbers; and t1, t2, and t3 are reference labels for the three tuples.) Thanks to the JD, then, the following tuple must appear as well:

     t4  =  s1  p1  j1

But {SNO,PNO} is a key; it follows that tuples t1 and t4, since they have the same key value, are in fact one and the same (and hence that j1 = j2). Hence, the kind of redundancy we observed in Chapter 9 with SPJ doesn’t occur with SPJ′. (To be more specific, tuple t4 in this case isn’t an “additional” tuple. See the discussion of tuple forcing JDs in the paragraph immediately following.) In other words, SPJ′, even though it’s not in 5NF, doesn’t and in fact can’t suffer from the kind of redundancy that 5NF is intended to address. Thus, it looks as if 5NF might be, in a certain sense, too strong for the purpose.

Let me now remind you of the notion of tuple forcing JDs. Basically, a JD is tuple forcing if it’s such that, if certain tuples appear, certain additional tuples are forced to appear as well. I’ve appealed to this notion several times in previous chapters, but I’ve never really defined it properly—so here now is such a definition:

  • Definition: Let J be a JD with respect to heading H, and let J hold in relvar R. Then J might or might not have the consequence that if certain tuples t1, ..., tn appear in R, a certain additional tuple t is forced to appear in R as well (where additional means t is distinct from each of t1, ..., tn). If it does have that consequence, then J is tuple forcing with respect to R. Note: It’s easy to see that any such J must be (a) nontrivial, (b) not implied by any FD of R, and (c) not implied by the keys of R.[129]

Relvar SPJ in Chapter 9 was subject to a tuple forcing JD (indeed, that was precisely what was wrong with the design of that relvar). But the same criticism doesn’t apply to the SPJ′ example: The JD that holds in SPJ′ does not force any additional tuples to appear, thanks to the FD that also holds in that relvar. That’s why SPJ′ doesn’t suffer from the same kind of redundancy as SPJ does, even though—like SPJ—it’s not in 5NF.

With the foregoing example by way of motivation, then, I can define another new normal form. However, I need to do it a step at a time. First we need a precise definition of the kind of redundancy we’re talking about:

  • Definition: Relvar R is FD redundant if and only if it’s not in BCNF.

  • Definition: Relvar R is JD redundant if and only if some tuple forcing JD holds in R.

Note that neither kind of redundancy implies the other; that is, a relvar can be FD redundant without being JD redundant, or JD redundant without being FD redundant.[130] For example:

  • Relvar SPJ from Chapter 9 (with its attributes SNO, PNO, and JNO; key the combination of all three attributes; and JD {{SNO,PNO},{PNO,JNO},{JNO,SNO}}) is in BCNF and hence not FD redundant, but it’s clearly JD redundant.

  • The suppliers relvar S (with its attributes SNO, SNAME, STATUS, and CITY; key SNO; and FD {CITY} → {STATUS}) isn’t in BCNF and is therefore FD redundant. But no tuple forcing JDs hold in that relvar, and so it isn’t JD redundant.

Now to continue with the definitions:

  • Definition: Relvar R is redundancy free if and only if it’s neither FD redundant nor JD redundant.[131]

Note that a 5NF relvar is certainly redundancy free by the foregoing definition. So is an SKNF relvar, come to that; but a relvar doesn’t have to be in 5NF, or even SKNF, in order to be redundancy free (see below).

  • Definition: Relvar R is in redundancy free normal form (RFNF) if and only if it’s redundancy free.

In other words, relvar R is in RFNF if and only if it’s neither FD redundant nor JD redundant: equivalently, if and only if it’s in BCNF and no tuple forcing JD holds.

Of course, while the foregoing definition is both precise and accurate, it’s of little practical use, because it doesn’t help much with the question of determining whether a given relvar is indeed in RFNF. But there’s a theorem that does help in this regard:

  • Theorem: Relvar R is in RFNF if and only if it’s in BCNF and, for every explicit JD J that holds in R, some component of J is a superkey for R.

This theorem provides both necessary and sufficient conditions for a relvar to be in RFNF. We can therefore take the theorem as a useful, usable test for RFNF: in effect, as a valid definition of RFNF. Note: The theorem refers to explicit JDs of R, but in fact we could drop that qualifier and what would be left would still be true (i.e., R is in RFNF if and only if every JD that holds in R has a superkey component). However, including that qualifier makes the theorem “tighter,” in a sense. In particular, it means there’s no need to check a relvar’s implicit JDs in order to test whether the relvar in question is in RFNF. (As a matter of fact, we could make the theorem tighter still by replacing “every explicit JD” by “every explicit irreducible JD.” In practice, however, it would be quite unlikely for some explicit JD not to be irreducible, so the point is perhaps not very important.)

The next theorem shows that RFNF does indeed fall strictly between 4NF and 5NF; in fact, it falls strictly between 4NF and SKNF.[132]

  • Theorem: 5NF implies SKNF; SKNF implies RFNF; and RFNF implies 4NF. The reverse implications do not hold.

To recap, then: RFNF is strictly weaker than 5NF, though it does just as much as 5NF to eliminate redundancy.

Here now are two more theorems that provide simple, useful, and practical tests:

  • Theorem: Let R be a 3NF relvar and let R have no composite key; then R is in RFNF. (Recall that a composite key is one consisting of two or more attributes.)

  • Theorem: Let R be a BCNF relvar and let R have a noncomposite key; then R is in RFNF.

Each of these theorems provides a sufficient condition, though not a necessary one, for a relvar to be in RFNF. Observe that the conditions in question have the attractive property that they refer to FDs only, not JDs. Note: As a matter of fact the first of these theorems should come as no surprise, because we already know from Chapter 10 (section A USEFUL THEOREM) that a 3NF relvar with no composite keys is in 5NF. A fortiori, therefore, such a relvar is also in RFNF. As for the second theorem, it should be clear that if R is in BCNF and has a noncomposite key K, then K must necessarily be included in at least one component of every JD that holds in R, whence the stated result follows immediately.

This brings us to the end of what might be called the formal part of the RFNF discussion. However, I want to take a closer look at the motivating example (relvar SPJ′), because there’s more that can usefully be said about that example. Recall that the FD {SNO,PNO} → {JNO} and the JD {{SNO,PNO},{PNO,JNO},{JNO,SNO}} both hold in that relvar. But what do these facts mean from an intuitive point of view? Well, suppose the relvar contains these three tuples:

     t1  =  s1  p1  j2
     t2  =  s1  p2  j1
     t3  =  s2  p1  j1

Suppose also that s1s2, p1p2, and j1j2. Because of the JD, then, the following tuple must also appear:

     t4  =  s1  p1  j1

But {SNO,PNO} is a key; so tuples t1 and t4 must be one and the same and j1 must be equal to j2, contradicting our original assumption. Thus, if the relvar were to contain just tuples t1 and t2, an attempt to insert tuple t3 must fail, precisely because it leads to that contradiction. Thus we see the following (somewhat bizarre) business rule must be in effect:

  • If (a) supplier s1 supplies part p1 to project j2 and (b) supplier s1 also supplies part p2 to project j1 (p1p2, j1j2), then (c) no supplier, not even s1, can supply part p1 to project j1.[133]

What’s more, it should be clear that the following equally bizarre rules must be in effect as well (note the symmetry):

  • If (a) supplier s1 supplies part p1 to project j2 and (b) supplier s2 supplies part p1 to project j1 (s1s2, j1j2), then (c) no part, not even p1, can be supplied by supplier s1 to project j1.

  • If (a) supplier s1 supplies part p2 to project j1 and (b) supplier s2 supplies part p1 to project j1 (s1s2, p1p2), then (c) no project, not even j1, can be supplied by supplier s1 with part p1.

In fact, these three business rules can all be combined into one, as follows. Let’s agree, just for the moment, to say each tuple of relvar SPJ′ represents a shipment (by some supplier of some part to some project). Then there cannot exist three distinct shipments x, y, and z such that x and y involve the same supplier, y and z involve the same part, and z and x involve the same project.

There’s still another point to be made in connection with the SPJ′ example. Refer again to the analysis that led to the first of the foregoing three business rules. That analysis showed that tuple t3 can’t appear together with tuples t1 and t2. It follows, therefore, that SPJ′ suffers from an insertion anomaly, despite the fact that it’s in RFNF (and the fact that no tuple forcing JD holds, therefore). By contrast, it doesn’t suffer from a deletion anomaly—assuming, that is, that the only constraints to which it’s subject are the stated FD and JD (and logical consequences thereof). So one difference between 5NF and RFNF is this: Even though both are redundancy free, 5NF guarantees “no insertion anomalies” while RFNF doesn’t—assuming, again, that FDs and JDs are the only constraints under consideration.

Of course, it’s tempting to conclude from the SPJ′ example that relvars that are in RFNF and not 5NF are likely to be rare in practice. Nevertheless, there’s a clear logical difference between the two normal forms, and thus, from the point of view of reducing redundancy at least, it’s really RFNF and not 5NF that ought to be the target to be aimed for. (As a bonus, I note that RFNF is a little easier to test for, too, than 5NF is.)

Note: As a matter of fact, the SPJ′ example reinforces the foregoing observation in another way also. Since the relvar is subject to the JD {{SNO,PNO},{PNO,JNO},{JNO,SNO}}, it can be nonloss decomposed into its projections on {SNO,PNO}, {PNO,JNO}, and {JNO,SNO}, respectively. Those projections are each “all key,” and in fact in 5NF. However, that decomposition “loses” the FD {SNO,PNO} → {JNO}! As we saw in Chapter 6, losing dependencies is generally not recommended. Hence relvar SPJ′ illustrates the point that not only is 5NF sometimes too strong, but sometimes it might be positively contraindicated.



[127] “Our” RFNF has since been renamed ETNF (“essential tuple normal form”). For further discussion, see the “Stop Press” in Appendix C.

[128] In fact it’s not just in BCNF, it’s actually in 4NF. Equally, it’s not just not in 5NF, it’s not even in SKNF (please excuse the deliberately clumsy wording here).

[129] You might be thinking the third of these conditions is a logical consequence of the other two, but such is not the case. For example, consider a relvar R with attributes A, B, C, and D and keys A and B; let no FDs hold in R except ones implied by those keys. Then it’s easy to see the nontrivial JD {AB,AD,BC} holds in R, because that JD is implied by those keys taken together. However, it isn’t implied by either key taken individually; in other words, it isn’t implied by any individual FD, as such, that holds in R.

[130] This state of affairs lends weight to something I said in Chapter 9: viz., that it’s better to regard FDs and JDs as different phenomena, instead of thinking of FDs as a special case of JDs.

[131] The term redundancy free here is perhaps not well chosen, giving as it does a very specialized meaning to an otherwise very general term.

[132] Relvar SPJ′ is a concrete example of a relvar that’s in RFNF but (as noted in an earlier footnote) not in SKNF.

[133] By the same token, no supplier, not even s1, can supply part p2 to project j2, either. Note: A similar remark applies also to the “equally bizarre” rules to be discussed in just a moment, of course.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset