CONCLUDING REMARKS

I’d like to close this chapter by addressing a question I haven’t discussed in this book at all so far. It’s a matter of terminology. To be specific: Why are 1NF, 2NF, and the rest called normal forms, anyway? Come to that, why is normalization called normalization?

The answers to these questions derive from mathematics (though the ideas spill over into several related disciplines, including the computing discipline in particular—think of floating point numbers, for example). In mathematics, we often find ourselves having to deal with some large, possibly even infinite, set of objects of some kind: for example, the set of all matrices, or the set of all rational numbers, or—coming a little closer to home—the set of all relations. In such a situation, it’s desirable to find a set of canonical forms for the objects in question. Here’s a definition:

  • Definition: Given a set s1, together with a defined notion of equivalence among elements of that set, subset s2 of s1 is a set of canonical forms for s1 if and only if every element x1 of s1 is equivalent to just one element x2 of s2 under that notion of equivalence (and that element x2 is the canonical form for the element x1).[32] Various “interesting” properties that apply to x1 also apply to x2; thus, we can study just the small set s2, not the large set s1, in order to prove a variety of “interesting” theorems or results.

As a trivial illustration of this notion, let s1 be the set of nonnegative integers {0,1,2,...}, and let two such integers be equivalent if and only if they leave the same remainder on division by five. Then we can define s2 to be the set {0,1,2,3,4}. As for an “interesting” theorem that applies in this example, let x1, y1, and z1 be any three elements of s1 (i.e., any three nonnegative integers), and let their canonical forms in s2 be x2, y2, and z2, respectively; then the product y1 * z1 is equivalent to x1 if and only if the product y2 * z2 is equivalent to x2.

Now, normal form is just another term for canonical form. So when we talk about normal forms in the database context, we’re talking about a canonical representation for data. To spell the point out: Any given collection of data can be represented relationally in many different ways, as we know. Of course, all of those ways are—in fact, must be—information equivalent; that is, information equivalence is the kind of equivalence we appeal to in this particular context. However, some of those ways (of representing the given information) are preferred over others for various reasons. And those preferred ways are, of course, the relational normal forms that are the subject of much of this book.

As for the term normalization, it simply refers to the general process of mapping some given object into its canonical equivalent. In the database context in particular, therefore, it’s used (as we know) to refer to the process of mapping some given relvar into a collection of relvars that (a) when considered together, are information equivalent to the original relvar, but (b) are each individually in some preferred normal form.

To the foregoing I should perhaps add the following. As far as I know, Codd himself never mentioned, in his early writings on the subject, his reasons for introducing the terminology of normal forms or normalization. But many years afterward, he did go on record with his own explanation:[33]

Interviewer: Where did “normalization” come from?

Codd: It seemed to me essential that some discipline be introduced into database design. I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.



[32] It’s reasonable to require also that every element x2 of s2 be equivalent to at least one element x1 of s1. Let me also draw your attention to the following remarks, paraphrased from the answer to Exercise 2.3 in Appendix D: Throughout this book, expressions of the form “B is a subset of A” must be understood to include the possibility that B and A might be equal. For example, the set {x,y,z} is a subset of itself. When I want to exclude such a possibility, I’ll talk explicitly in terms of proper subsets; for example, the set {x,z} is a proper subset of the set {x,y,z}. Of course, no set is a proper subset of itself.

[33] In “A Fireside Chat: Interview with Dr. Edgar F. Codd” (DBMS Magazine 6, No. 13, December 1993).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset