RELVARS WITH MORE THAN ONE KEY

Now let’s consider some reasonably realistic examples of relvars with more than one key. The first concerns a relvar EXAM, with attributes S (student), J (subject), and P (position), and predicate Student S was examined in subject J and achieved position P in the class list. For the sake of the example, let’s assume there are no ties (that is, no two students obtained the same position in the same subject). Then, clearly, given a student and subject, there’s exactly one corresponding position; equally, given a subject and position, there’s exactly one corresponding student. So the FDs {S,J} → {P} and {J,P} → {S} both hold, and {S,J} and {J,P} are both keys (or candidate keys, if you prefer):

     EXAM { S , J , P }
          KEY { S , J }
          KEY { J , P }

Here’s another example (it’s basically Exercise 13.6 from Chapter 13): We’re given a relvar representing marriages, with attributes A, B, and C and predicate Person A married person B on date C. Assuming no polygamy, and assuming also that no two persons marry each other more than once, every pair of attributes here is a key:

     MARRIAGE { A , B , C }
              KEY { A , B }
              KEY { B , C }
              KEY { C , A }

And here’s yet another example, based on a simple airline application (the predicate is Pilot PILOT takes a flight out from gate GATE at hour HOUR on day DAY):

     ROSTER { DAY , HOUR , GATE , PILOT }
            KEY { DAY , HOUR , GATE }
            KEY { DAY , HOUR , PILOT }

How do we choose the primary key in cases such as these? What grounds are there for choosing one key over another? Codd’s criterion of “simplicity” doesn’t seem to help. Note too that whichever we choose, we wind up with an unpleasant asymmetry; e.g., in the marriage example, we might find ourselves treating one spouse as “more equal than the other” (and thereby certainly offending someone). Why should we be forced to introduce such asymmetry? Asymmetry is usually not a good idea. Here again, repeated from Chapter 15, is that quote from Polya: “Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.”

Now, in all of the foregoing examples the keys were not only composite, they overlapped (i.e., they had an attribute in common). Lest it be thought that it’s only when keys overlap that there might be difficulty in choosing the primary key, therefore, let me give a counterexample. Suppose we have a relvar ELEMENT representing the periodic table (i.e., the table of of chemical elements).[174] Then every element has a unique name (e.g., lead), a unique symbol (e.g., the symbol for lead is Pb), and a unique atomic number (e.g., the atomic number for lead is 82) . The relvar thus clearly has three distinct keys, all of which are simple (i.e., involve just one attribute), and there’s obviously no overlap at all. On what grounds do we choose one of these three keys as the primary key? It seems to me a good case could be made for any of them, depending on circumstances.

Here’s another familiar (perhaps all too familiar) example of a relvar with several keys, all of which are simple:

     TAX_BRACKET { LOW , HIGH , PERCENTAGE }
                 KEY { LOW }
                 KEY { HIGH }
                 KEY { PERCENTAGE }

Of course, I’m assuming here that no two taxable income ranges (LOW to HIGH) are subject to the same tax rate.

I could give many more examples, but by now my point is presumably clear: Not only are there no formal criteria for choosing one key over another (in those cases where there’s a choice), but sometimes there don’t appear to be any informal criteria either. Thus, it really doesn’t seem appropriate to insist that such a choice must always be made, even if it’s appropriate in some cases (perhaps even most cases).

There’s another important point that needs to be mentioned, a more formal one than most of those I’ve been making so far. Over the past 40 years or so, a great deal of research has been carried out on dependency theory and further normalization, view updating, optimization (including semantic optimization in particular), usability, and many other matters. And in all of this research it’s candidate keys, not primary keys, that play the crucial role. (Indeed, it must be, precisely because the research in question is formal. The candidate key concept is formally defined. The primary key concept isn’t.) Since this is so, it really doesn’t seem appropriate to insist formally on the PK:AK distinction—though, to repeat, it might be appropriate to recommend it informally.

Yet another point I want to make is that the PK:AK distinction leads to an undesirable and unnecessary differentiation between base relvars and other relvars. That’s because, according to Codd, the relational model:

  • Requires primary keys for base relvars;

  • Permits but does not require them for views and snapshots; and

  • Considers it “completely unnecessary for primary keys to be declared or deduced” for any other relvars (italics in the original).

These statements are paraphrased (but only slightly) from the paper in which Codd said there was no formal basis for choosing the primary key (see earlier in this appendix). As a matter of fact, that paper goes so far as to suggest that relvars other than base ones might not even possess a primary key, a suggestion that if true surely raises serious questions about the concept in the first place—remember The Principle of Interchangeability (of base relvars and views). Be that as it may, my position on these matters is rather different. To be specific, I would say:

  • First, every relvar, base or derived, does have at least one key (because, of course, no relation, and a fortiori no relvar, ever permits duplicate tuples).

  • Second, every base relvar must have at least one key explicitly declared. Preferably, of course, all such keys should be explicitly declared.

  • Often a base relvar will have an explicitly declared primary key in particular, but I don’t insist on this state of affairs as a hard requirement.

  • For reasons explained in detail in SQL and Relational Theory, I believe the system should be able to deduce keys for derived relvars.

  • The previous point notwithstanding, I believe it should also be possible to declare keys for derived relvars (for views and snapshots in particular). Again, see SQL and Relational Theory for further discussion.



[174] Like the PLUS example (q.v.) in Chapter 13, ELEMENT might actually be a relation constant rather than a relation variable, but it still has keys.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset