TUPLES vs. PROPOSITIONS

As we know, every tuple appearing in some given relvar R at some given time represents a certain proposition, the proposition in question being an instantiation of the relvar predicate for that relvar R that (by convention) is understood to be true at the time in question. For example, here again is the predicate for relvar HP from Figure 14-2 and Figure 14-3:

Part PNO is named PNAME, has color COLOR and weight WEIGHT (which is greater than or equal to 17.0), and is stored in city CITY.

This relvar contains (among other things) a tuple for part P6, and that tuple represents the following instantiation of the foregoing predicate:

Part P6 is named Cog, has color Red and weight 19.0 (which is greater than or equal to 17.0), and is stored in city London.

Loosely speaking, then, we can say the database “contains propositions.” Now, I’ve said several times at earlier points in the book that the database involves some redundancy if and only if it says the same thing twice. Now I can make this statement a little more precise:

  • Definition: The database involves redundancy if and only if it contains two distinct representations of the same proposition.

Now, given that tuples represent propositions, it’s tempting to translate the foregoing definition into the following one: The database involves some redundancy if and only if it contains two distinct appearances of the very same tuple.[144] Unfortunately, this latter “definition” is, at best, considerably oversimplified. Let’s examine it more carefully.

First of all, it’s at least true that we don’t want the same tuple to appear more than once in the same relvar (at the same time, that is), because such a state of affairs would certainly constitute “saying the same thing twice.” (As I once heard Codd remark: If something is true, saying it twice doesn’t make it any more true.) Of course, the relational model itself takes care of this requirement—by definition, relations never contain duplicate tuples, and the same is therefore true for relvars, and so we can ignore this possibility. Note: In other words, it might be argued that a desire to avoid redundancy is one of the motivations (perhaps a minor one) for choosing sets—which don’t contain duplicate elements, by definition—instead of “bags,” which do, as the right mathematical abstraction on which to found a solid database theory. SQL apologists please note!

Aside: I observe in passing that now we have a precise characterization of the notion of “duplicate tuples.” (People use this phrase all the time, and yet I very much doubt whether many of them would be able to define it precisely if pressed.) Strictly speaking, of course, two tuples are duplicates if and only if they’re the very same tuple, just as two integers are duplicates if and only if they’re the very same integer. The phrase “duplicate tuples” thus doesn’t really make much sense from a logical point of view (to say two distinct tuples are duplicates is a contradiction in terms). What people are really talking about when they use that phrase is duplicate appearances of the same tuple. For that reason, the phrase “duplicate elimination,” which as we all know is often encountered in database contexts, would much better be duplication elimination. But I digress ... Let’s get back to the main discussion. End of aside.

Second, we also don’t want the same subtuple to appear more than once in the same relvar (again, at the same time).[145] But classical normalization takes care of this one; e.g., it’s precisely because the FD {CITY} → {STATUS} holds in relvar S, causing the same {CITY,STATUS} pair to occur repeatedly (with the same meaning every time it appears), that we’re recommended to replace that relvar by its projections on {CITY,STATUS} and {SNO,SNAME,CITY}.

My next point is that the very same tuple can represent any number of distinct propositions, as can easily be seen. As a trivial example, let SC and PC be the projection of relvar S on CITY and the projection of relvar P on CITY, respectively. Given our usual sample values, then, a tuple containing just the CITY value “London” appears in both SC and PC—but those two appearances represent distinct propositions. To be specific: The appearance in SC represents the proposition There’s at least one supplier in London, and the appearance in PC represents the proposition There’s at least one part in London (simplifying slightly in both cases for the sake of the example).

What’s more—and here I have to get a little more formal on you for a moment—the same proposition can be represented by any number of distinct tuples, too. That’s because, formally, the pertinent attribute names are part of the tuple (check the definition of tuple in Chapter 5 if you need confirmation of this point). Thus, for example, we might have our usual shipments relvar SP with its attributes SNO, PNO, and QTY, and predicate:

Supplier SNO supplies part PNO in quantity QTY.

We might additionally have a relvar PS with attributes SNR, PNR, and AMT, with predicate:

Supplier SNR supplies part PNR in quantity AMT.

And then (to use Tutorial D syntax) the following tuples might appear in relvars SP and PS, respectively:

     TUPLE { SNO 'S1' , PNO 'P1' , QTY 300 }
     TUPLE { SNR 'S1' , PNR 'P1' , AMT 300 }

These are clearly different tuples, but they both represent the same proposition:

Supplier S1 supplies part P1 in quantity 300.

In fact, either of the two relvars SP and PS can be defined in terms of the other, as the following constraints (actually EQDs once again) both show:

     CONSTRAINT ...
        PS = SP RENAME { SNO AS SNR , PNO AS PNR , QTY AS AMT } ;

     CONSTRAINT ...
        SP = PS RENAME { SNR AS SNO , PNR AS PNO , AMT AS QTY } ;

A database that contained both relvars would thus clearly involve some redundancy.[146]

The net of the foregoing discussion is this: There’s a many to many relationship between tuples and propositions—any number of tuples can represent the same proposition, any number of propositions can be represented by the same tuple. Given this state of affairs, then, here’s an attempt at stating the orthogonality priniple a little more precisely:

  • Definition (second attempt): Let relvars R and R2 be distinct, and let them have headings {A1,...,An} and {B1,...,Bn}, respectively. Let relvar R1 be defined as follows:

         R1 = R RENAME { A1 AS B1′ , ... , An AS Bn′ }

    where B1′, ..., Bn′ is some permutation of B1, ..., Bn. (Observe that R1 and R2 thus have the same heading.) Then The Principle of Orthogonal Design says there must not exist restriction conditions c1 and c2, neither of which is identically false, such that the equality dependency (R1 WHERE c1) = (R2 WHERE c2) holds.

Points arising from this second attempt:

  • This version of the principle certainly solves the problem with the design of Figure 14-3: First, take R and R2 to be LP and HP, respectively, and define R1 thus:

         R1 = LP RENAME { PNO AS PNO , ... , CITY AS CITY }

    (In other words, take R1 to be identically equal to R.) Second, take both c1 and c2 to be the restriction condition WEIGHT = 17.0. Then the equality dependency (R1 WHERE c1) = (R2 WHERE c2) holds, and the design thus violates The Principle of Orthogonal Design. Note: As this example demonstrates, so long as c1 and c2 aren’t identically false, then certain tuples exist that, if and when they represent “true facts,” must appear in both R1 and R2—and, in essence, that’s the situation we want to outlaw. (By contrast, if c1 and c2 were identically false, the restrictions R1 WHERE c1 and R2 WHERE c2 would both be empty, and there wouldn’t be any orthogonality violation.)

  • In fact, this version of the principle subsumes the previous version, because (a) we can make R1 identical to R (by effectively making the renaming a “no op,” as in the previous bullet item) and (b) we can take each of c1 and c2 to be simply TRUE. (As I pointed out earlier, the previous version of the principle did assume the relvars in question had the same heading. As the discussions of the present section have shown, however, we can’t limit our attention to that simple case alone.)

  • Recall from Chapter 6 that, in logic, something that’s identically false (e.g., the boolean expression WEIGHT ≥ 17.0 AND WEIGHT < 17.0) is called a contradiction. Thus, the requirement that c1 and c2 not be identically false can be stated thus: Neither c1 nor c2 is a contradiction in the logical sense.



[144] One reviewer argued strongly that this “temptation” wasn’t tempting at all! Maybe not, but I still think it’s worth discussing.

[145] This statement too is hugely oversimplified. A slightly better one is: We don’t want the same subtuple to appear more than once if distinct appearances represent the same proposition—but this statement isn’t perfect, either. To try to make it more precise still would take us further afield than I wish to go at this point, however. See Chapter 15 for further explanation.

[146] The example thus suggests an obvious rule of thumb: When you start the design process—which as far as I’m concerned means when you write down the predicates and other business rules—always use the same name for the same property; don’t “play games” by using, e.g., both SNO and SNR to refer to supplier numbers, both QTY and AMT to refer to quantities, and so on. Following this rule will (among other things) make it less likely that two distinct tuples will represent the same proposition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset