BOYCE/CODD NORMAL FORM

As I said earlier, Boyce/Codd normal form (BCNF) is the normal form with respect to FDs—but now I can define it precisely:

  • Definition: Relvar R is in Boyce/Codd normal form (BCNF) if and only if, for every nontrivial FD XY that holds in R, X is a superkey.

Points arising:

  • It follows from the definition that the only FDs that hold in a BCNF relvar are either trivial ones (we can’t get rid of those, obviously) or arrows out of superkeys (we can’t get rid of those, either). Or as some people like to say: Every fact is a fact about the key, the whole key, and nothing but the key—though I must immediately add that this informal characterization, intuitively attractive though it is, isn’t really accurate, because it assumes among other things that there’s just one key.

  • The definition makes no reference to 2NF or 3NF. Note, however, that the definition can be derived from the 3NF definition by dropping condition (b) (“Y is a subkey”). It follows that BCNF implies 3NF—that is, if a relvar is in BCNF, then it’s certainly in 3NF.

By way of an example of a relvar that’s in 3NF but not BCNF, consider a revised version of the shipments relvar—let’s call it SNP—that has an additional attribute SNAME, representing the name of the applicable supplier. Suppose also that supplier names are necessarily unique (i.e., no two suppliers ever have the same name at the same time). Here then are some sample tuples:

image with no caption

Once again we observe some redundancy: Every tuple for supplier S1 tells us S1 is named Smith, every tuple for supplier S2 tells us S2 is named Jones, and so on; likewise, every tuple for Smith tells us Smith’s supplier number is S1, every tuple for Jones tells us Jones’s supplier number is S2, and so on. And the relvar isn’t in BCNF. First of all, it has two keys, {SNO,PNO} and {SNAME,PNO}.[41] Second, every subset of the heading—{QTY} in particular—is (of course) functionally dependent on both of those keys. Third, however, the FDs {SNO} → {SNAME} and {SNAME} → {SNO} also hold; these FDs are certainly not trivial, nor are they arrows out of superkeys, and so the relvar isn’t in BCNF (though it is in 3NF).

Finally, as I’m sure you know, the normalization discipline says: If relvar R isn’t in BCNF, then decompose it into projections that are. In the case of relvar SNP, either of the following decompositions will meet this objective:

  • Projecting on {SNO,SNAME} and {SNO,PNO,QTY}

  • Projecting on {SNO,SNAME} and {SNAME,PNO,QTY}

By the way, I can now explain why BCNF is the odd man out, as it were, in not having a name of the form “nth normal form.” I quote from the paper in which Codd first described this new normal form:[42]

More recently, Boyce and Codd developed the following definition: A [relvar] R is in third normal form if it is in first normal form and, for every attribute collection C of R, if any attribute not in C is functionally dependent on C, then all attributes in R are functionally dependent on C [i.e., C is a superkey].

So Codd was giving here what he regarded as a “new and improved” definition of third normal form. The trouble was, the new definition was, and is, strictly stronger than the old one; that is, any relvar that’s in 3NF by the new definition is certainly in 3NF by the old one, but the converse isn’t true—a relvar can be in 3NF by the old definition and not in 3NF by the new one (relvar SNP, discussed above, is a case in point). So what that “new and improved” definition defined was really a new and stronger normal form, which therefore needed a distinctive name of its own. However, by the time this point was sufficiently recognized, Fagin had already defined what he called fourth normal form, so that name wasn’t available.[43] Hence the anomalous name Boyce/Codd normal form.



[41] That’s why I didn’t show any double underlining when I showed the sample tuples—there are two candidate keys, and there doesn’t seem to be any good reason to make either of them “more equal than the other.”

[42] E. F. Codd: “Recent Investigations into Relational Data Base Systems,” Proc. IFIP Congress, Stockholm, Sweden (1974).

[43] Actually, when Raymond Boyce first came up with what became that “new and improved” normal form, he did call it fourth normal form! (The paper in which he first described the concept—IBM Technical Disclosure Bulletin 16, No. 1 (June 1973)—had the title “Fourth Normal Form and its Associated Decomposition Algorithm.”) I don’t know why that name was subsequently rejected (though I have my suspicions).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset