THE INVOICES AND SHIPMENTS EXAMPLE

I now turn my attention to a more elaborate example. The example—which is based on a real world application, incidentally—concerns invoices and shipments, and there’s a one to one relationship between these two entity types: Each shipment has exactly one invoice, each invoice has exactly one shipment. Here then is the “obvious” database design (for the sake of the example, I use a hypothetical syntax that explicitly distinguishes between primary and alternate keys):[175]

     INVOICE  { INVNO , SHIPNO , INV_DETAILS }
              PRIMARY KEY { INVNO }
              ALTERNATE KEY { SHIPNO }
              FOREIGN KEY { SHIPNO } REFERENCES SHIPMENT

     SHIPMENT { SHIPNO , INVNO , SHIP_DETAILS }
              PRIMARY KEY { SHIPNO }
              ALTERNATE KEY { INVNO }
              FOREIGN KEY { INVNO } REFERENCES INVOICE

So the database structure is as shown in Figure A-1 (note that the arrows in that figure, in contrast to arrows in figures elsewhere in this book, represent foreign key references, not functional dependencies):

The invoices-and-shipments database

Figure A-1. The invoices-and-shipments database

Now, each relvar in this example actually has two keys, {INVNO} and {SHIPNO}. However, I assume we can agree for the sake of argument that the “natural” primary key for INVOICE is {INVNO} and the “natural” primary key for SHIPMENT is {SHIPNO}; then {SHIPNO} in INVOICE and {INVNO} in SHIPMENT are alternate keys. Furthermore, of course, each of those alternate keys is also a foreign key (as Figure A-1 indicates), referring to the primary key of the other relvar.

One problem with the foregoing design is as follows. Clearly, the database is subject to the constraint—actually it’s an equality dependency, and I’ll call it C—that if the INVOICE relvar shows invoice i as corresponding to shipment s, then the SHIPMENT relvar must show shipment s as corresponding to invoice i (and vice versa):[176]

     CONSTRAINT C
        INVOICE { INVNO , SHIPNO } = SHIPMENT { INVNO , SHIPNO } ;

In other words, the tuple (i,s,...) appears in INVOICE if and only if the tuple (s,i,...) appears in SHIPMENT. But the design of Figure A-1 doesn’t capture or enforce this constraint (for example, the configuration of values shown in Figure A-2 below is permitted by that design and yet violates the constraint). The constraint thus needs to be separately stated (as above) and separately enforced.

“Legal” INVOICE and SHIPMENT values that violate constraint C

Figure A-2. “Legal” INVOICE and SHIPMENT values that violate constraint C

Aside: It might be thought that if we pretended the primary key for each relvar was the combination {INVNO,SHIPNO}, and if we further defined each of those fake “primary keys” to be a foreign key referencing the other, then constraint C would be taken care of automatically.[177] But the relational model requires primary keys—in fact, keys in general—to be irreducible, meaning they mustn’t contain any attributes that are irrelevant for unique identification purposes (and there are good reasons for that requirement, too, as we know from Chapter 4). In other words, {INVNO,SHIPNO} just isn’t a key (and so it certainly can’t be the primary key) for either relvar, and we’d be lying if we told the system otherwise. Indeed, if {INVNO,SHIPNO} were truly a key, then the relationship between invoices and shipments would be many to many, which it isn’t. End of aside.

Precisely because constraint C holds, the design of Figure A-1 clearly involves some redundancy: Every pair of {INVNO,SHIPNO} values appearing in either relvar also necessarily appears in the other. Now, we could avoid that redundancy by combining the two relvars into one:

     INV_SHIP { INVNO , SHIPNO , INV_DETAILS , SHIP_DETAILS }
              PRIMARY KEY { INVNO }
              ALTERNATE KEY { SHIPNO }

By eliminating the redundancy in this way, we’ve also eliminated the need to state and enforce constraint C. Furthermore, we could now define the original INVOICE and SHIPMENT relvars as views—specifically, projection views—of INV_SHIP, thus allowing the user still to regard invoices and shipments as distinct entities.[178] This revised design thus does enjoy certain advantages over the “obvious” version.

On the other hand, there are some disadvantages, too. Observe first that we’ve had to make an asymmetric decision once again, choosing {INVNO} over {SHIPNO}—arbitrarily—as the primary key for relvar INV_SHIP. Second, suppose further that shipments have certain subsidiary information that invoices don’t; e.g., suppose shipments are containerized, each shipment involving several containers. Then a new CONTAINER relvar is needed:

     CONTAINER { CONTNO , SHIPNO , ... }
               PRIMARY KEY { CONTNO }
               FOREIGN KEY { SHIPNO } REFERENCES INV_SHIP { SHIPNO }

And now we have a foreign key referencing an alternate key!—which is prohibited by the relational model as originally defined, as we know.

Now, can we avoid this apparent violation of the prescriptions of the original model? Well, of course, the answer is yes. There are various ways in which this might be done:

  1. We could go back to the two-relvar design (thereby reintroducing the data redundancy and the need for the additional constraint, however).

  2. We could replace SHIPNO by INVNO in the CONTAINER relvar. However, this approach seems very artificial (containers have nothing to do with invoices per se), and moreover introduces an unpleasant level of indirection into the design (the shipment for a given container would be accessible only via the corresponding invoice).

  3. We could leave the CONTAINER relvar as it is, but replace the foreign key specification by an explicit declaration to the effect that every SHIPNO value in CONTAINER must also appear in INV_SHIP (using a language like SQL or Tutorial D that permits the definition of arbitrarily complex constraints). But it does seem a pity to have to deal with a constraint that’s so similar to a “true” foreign key constraint in such a roundabout manner; indeed, it could be argued that the effect is again to introduce an undesirable asymmetry, foreign keys that reference primary keys being treated in one manner and “foreign keys” that reference alternate keys being treated in quite another.

  4. We could introduce a surrogate primary key ({ISNO}, say) for INV_SHIP, and use that as the foreign key in the CONTAINER table—which would still involve a level of indirection, as in paragraph 2 above, but would at least reintroduce the symmetry that was lost when we arbitrarily chose {INVNO} as the primary key for INV_SHIP.

To summarize: None of these four “workaround” approaches seems totally satisfactory. The example thus seems to show that—if we wish to avoid redundancy and arbitrariness and artificiality and asymmetry and indirectness—then we need to be able to treat primary and alternate keys as equals, and we need to be able to have foreign keys that reference alternate keys. In other words, we need to ignore the differences between primary and alternate keys, and simply consider them all as just keys. Please note carefully, however, that I’m not saying the apparent need in this example to violate certain precepts of the original relational model can’t be avoided; what I’m saying is I don’t see a good way to avoid it, nor a good reason for adopting a bad way. I would therefore like to suggest that the precepts in question be treated as strong (?) guidelines but not as inviolable rules.



[175] One reviewer asked why a design consisting of three relvars (one each for invoices and shipments and one for the association between them) wasn’t the “obvious” design. Well, it’s probably a better design, and it might be the obvious one. But that association relvar still has two keys (INVNO and SHIPNO), and the major conclusion of the argument that follows—viz., that those two keys need to be treated as equals—still stands.

[176] Observe, therefore, that the design violates orthogonality (see Chapter 14).

[177] I’ve actually seen such a subterfuge explicitly recommended, by people who really ought to know better.

[178] There might be some difficulty over updating those views, of course, given the state of today’s commercial products. But this is a separate issue, beyond the scope of this appendix (and this book).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset