REFINING THE DEFINITION

I’ve deliberately left this section to the very end of the chapter (almost). Consider the shipments relvar SP, with its predicate Supplier SNO supplies part PNO in quantity QTY. Consider also the relation shown as the value of that relvar in Figure 1-1 in Chapter 1. Observe that:

  1. Two of the tuples in that relation are (S1,P5,100) and (S1,P6,100).

  2. Both of those tuples include (S1,100) as a subtuple.

What do those two appearances of that subtuple mean? Well, the appearance in (S1,P5,100) means:

  1. Supplier S1 supplies some part in quantity 100.

(I’ve numbered this proposition—note that it is indeed a proposition—for purposes of future reference.) And the appearance in (S1,P6,100) means exactly the same thing! So don’t we have here a situation in which the database contains two distinct appearances of some tuple that represent the very same proposition? In other words, in accordance with the definition I gave in the introduction to this chapter, doesn’t the database contain some redundancy?

Before I try to answer this question, I want to offer a simpler illustration of the same point. With reference again to Figure 1-1, consider the six shipment (SP) tuples shown in that figure for supplier S1. Clearly, those tuples all contain the subtuple (S1), of degree one. And those six appearances of that subtuple all mean the same thing:

  1. Supplier S1 supplies some part in some quantity.

We could even take the argument one step further and consider the fact that every SP tuple—in fact, every possible tuple, no matter what attributes it has—always has the 0-tuple as a subtuple. Thus, the twelve “appearances” (if you see what I mean!) of that subtuple in the shipments relation of Figure 1-1 all represent the following proposition:

  1. Some supplier supplies some part in some quantity.

So do we have redundancy on our hands here, or don’t we? Well, let me observe now that the foregoing propositions 1- 3 all involve some existential quantification. Here are slightly more formal versions of those propositions:

  1. There exists some part PNO such that supplier S1 supplies part PNO in quantity 100.

  2. There exists some part PNO such that there exists some quantity QTY such that supplier S1 supplies part PNO in quantity QTY.

  3. There exists some supplier SNO such that there exists some part PNO such that there exists some quantity QTY such that supplier SNO supplies part PNO in quantity QTY.

In these propositions, SNO in the third, QTY in the second and third, and PNO in all three aren’t parameters but what the logicians call bound variables, owing to the fact that they’re all “existentially quantified” by the phrase There exists some ... such that. Note: If you’re unfamiliar with these notions (viz., bound variables and existential quantification), you can find a tutorial treatment in SQL and Relational Theory. However, I think the present discussion should be easy enough to follow even if you don’t have any prior knowledge of such matters.

In contrast to the foregoing, the propositions represented by tuples in the underlying relvar SP don’t involve any existential quantifiers. That’s because they’re all just instantiations of the relvar predicate (i.e., they’re all obtained just by substituting arguments for the parameters of that predicate), and that predicate in turn, which is as follows, involves no such quantifiers either:

Supplier SNO supplies part PNO in quantity QTY.

To summarize to this point, then, it looks as if the following observations apply:

  1. What might be called the given propositions—the ones represented by tuples in the given relvars—are quantifier free.

    Aside: It might not be quite true to say the given propositions are always quantifier free; consider, e.g., a relvar with attributes WEIGHT and HEIGHT and predicate Some person has weight WEIGHT and height HEIGHT. However, we can effectively eliminate the quantifiers in such a situation by a process known as skolemization (after the logician T. A. Skolem). In the example, that process involves replacing the original predicate by a predicate of the form Person p has weight WEIGHT and height HEIGHT (where p denotes some person or persons unknown). This latter predicate is quantifier free. End of aside.

  2. By contrast, derived propositions—at least, derived propositions that correspond to tuples obtained by taking projections of tuples in the given relvars—do involve at least one existential quantifier.

Now, we surely don’t want to say our usual shipments relvar SP is intrinsically redundant (do we?). So it looks as if what we might want to say is something along the lines of:

If the same proposition is represented twice, but that proposition is existentially quantified, then that repetition doesn’t count as redundancy.

But wait a minute—what about the suppliers relvar S, with its FD {CITY} → {STATUS}? An argument exactly analogous to the foregoing would seem to suggest that, e.g., the tuple (London,20), which appears as a subtuple in two of the supplier tuples depicted in Figure 1-1, represents the proposition:

There exists some supplier SNO such that there exists some name SNAME such that supplier SNO is named SNAME, has status 20, and is located in city London.

Clearly this proposition is existentially quantified; yet it’s represented twice, and we do want that repetition to count as redundancy. (As we know, relvar S isn’t in BCNF.) So what’s going on?

As is so often the case, I believe the answer to this question can be found by taking a closer look at the predicates. First, recall from Chapter 13 that a predicate is simple if it involves no connectives, and conjunctive if it’s a set of other predicates connected by AND (and I’ll assume for present purposes that those other predicates are all simple ones). Now, the predicate for relvar SP, Supplier SNO supplies part PNO in quantity QTY, is simple in the foregoing sense. By contrast, the predicate for relvar S is conjunctive—it can be decomposed into a set of simple predicates. I can make this latter fact more immediately obvious by stating the predicate in the following slightly stilted but actually more logical form:

Supplier SNO is named SNAME and

Supplier SNO is located in city CITY and

City CITY has status STATUS.

It should be clear, then, given this version of the predicate, that:

  1. Relvar S is subject to the nontrivial, irreducible JD {SN,SC,CT}, where the names SN, SC, and CT denote the headings {SNO,SNAME}, {SNO,CITY}, and {CITY,STATUS}, respectively. (By contrast, the only JDs that hold in relvar SP are trivial ones, and SP is in 6NF. Relvar S, of course, isn’t even in BCNF.)

  2. Relvar S can therefore be nonloss decomposed in accordance with that JD. The predicates for the corresponding projections are as follows:

    SN: Supplier SNO is named SNAME.

    SC: Supplier SNO is located in city CITY.

    CT: City CITY has status STATUS.

    These predicates aren’t existentially quantified, and so the corresponding propositions aren’t, either.[169]

  3. Relvar S certainly contains subtuples corresponding to SN, SC, and CT; however, those corresponding to SN and SC are never repeated because {SNO} is a key. By contrast, those corresponding to CT are repeated, at least potentially (as we know from Figure 1-1), and such repetition does constitute redundancy.

With all of the foregoing by way of motivation, then, I offer the following as a putative “final” definition of what it means for a database to exhibit redundancy:

  • Definition (“final” version): Let D be a database design; let DB be a database value (i.e., a set of values for the relvars mentioned in D) that conforms to D; and let p be a proposition not involving any existential quantification. If DB contains two or more distinct representations of p (either implicitly or implicitly), then DB contains, and D permits, redundancy.

Observe in particular that a database can still display redundancy by this definition, even if it fully conforms to The Principle of Orthogonal Design and all normalization principles. Note, however, that the definition still says if (not if and only if) a certain condition holds, then there’s redundancy; I’d like to replace that if by if and only if, but I don’t quite have the courage of my convictions here. Not yet.

Be that as it may, let’s consider Examples 1-12 from earlier sections of this chapter and see what the implications are for those examples specifically. Please note carefully: For simplicity, I use the unqualified term proposition throughout the following analysis to mean a proposition that’s not existentially quantified.

Examples 1-2

Both of these examples display redundancy because the proposition City SFO and state CA are the city and state for zip 94100 is represented twice.

Example 3

Suppose two distinct tuples both contain the DATE value “Tuesday, January 18th, 2011”; then the database clearly displays redundancy because the proposition January 18th, 2011 is a Tuesday is represented twice, explicitly. In fact, there’s redundancy even if that DATE value appears just once! The reason is that even in that case, the proposition January 18th, 2011 is a Tuesday is represented both explicitly and implicitly, as a result of the fact that one part of the value, the day of the week (Tuesday, in the example), can be determined algorithmically from the rest of the value (January 18th, 2011, in the example).

Example 4

Let employee E1 be represented in both relvar EMP and relvar PGMR. The corresponding propositions are E1 is an employee and E1 is a programmer. The former proposition is a logical consequence of the combination of the latter proposition together with the proposition All programmers are employees. (Note that this latter proposition is represented, in effect, by the fact that {ENO} in PGMR is a foreign key referencing {ENO} in EMP.) Thus, the proposition E1 is an employee is represented twice, once explicitly and once implicitly.

Example 5

Let employee E1 be represented in relvar EMP and let the JOB value for E1 be “Programmer” (so employee E1 is represented in relvar PGMR as well). Then the proposition E1 is a programmer is clearly represented explicitly in two different ways.

Example 6

For a supplier who does have at least one of the three properties (name, status, and city), this example is essentially the same as Example 4, mutatis mutandis. (For a supplier with none of those properties, there’s no redundancy.)

Example 7

The proposition Employee E5 is in department D3 is represented both explicitly by a tuple in EMPDEPT and implicitly by the combination of (a) the proposition Every employee is in exactly one department (which is effectively represented by the pertinent foreign key definition) and (b) the lack of tuples in EMPDEPT representing the propositions Employee E5 is in department D1 and Employee E5 is in department D2.

Example 8

The proposition Employee E4 is unsalaried is represented both explicitly by a tuple in UNSALARIED and implicitly by the combination of (a) the proposition Every employee has a known salary or an unknown salary or is unsalaried (which should again represented by a certain declared integrity constraint) and (b) the lack of a tuple for employee E4 in either EARNS or SALARY_UNK.

Example 9

Now this is an interesting one. Earlier, I said the following:

The redundancies ... are obvious: For example, the fact that student S1 is enrolled in course C1, the fact that course C1 is tutored by tutor T1, and the fact that tutor T1 tutors student S1 are all represented more than once in the sample value shown in [Figure 15-7].

I also said the predicate was Tutor TNO tutors student SNO on course CNO. But if the redundancies really are as stated, the predicate can’t be quite that simple. Instead, it has to be something like this:

Student SNO is enrolled in course CNO and

Course CNO is tutored by tutor TNO and

Tutor TNO tutors student SNO and

Tutor TNO tutors student SNO on course CNO.

A more complete design would thus involve relvars as follows:

  • S {SNO,...}, C {CNO,...}, and T {TNO,...}, representing students, courses, and teachers, respectively

  • SC {SNO,CNO,...}, CT {CNO,TNO,...}, and TS {TNO,SNO,...}, showing which students are enrolled in which courses, which courses are tutored by which tutors, and which tutors tutor which students, respectively

  • SCT {SNO,CNO,TNO}, as in the original version of the example

Observe now that:

  1. Relvar SC is equal to some subset of the join (actually the cartesian product) of S{SNO} and C{CNO} (and similarly for CT and TS).

  2. Relvar SC is also equal to the projection of SCT on {SNO,CNO} (and, again, similarly for CT and TS). Note: To be more precise about the matter, SC is equal to SCT projected on {SNO,CNO} if and only if no student can be enrolled in a course without being assigned a tutor for that course (and, once again, similarly for CT and TS). So we’re making some semantic assumptions about the situation here, assumptions that weren’t spelled out originally and might or might not be valid.

  3. Relvar SCT is also equal to some subset of the join of SC, CT, and TS, and that join in turn is some subset of the join (actually the cartesian product) of S{SNO}, C{CNO}, and T{TNO}.

Because of point 2 in particular, SC, CT, and TS can be dropped, as indeed they were in the original version of the example. Assume for the moment, however, that they aren’t dropped. Then there’s clearly redundancy. For example, given the sample values from Figure 15-7, the proposition Student S1 is enrolled in course C1 is represented (a) by an explicit tuple in relvar SC and also (b) as one of the conjuncts in the following proposition, which is represented by an explicit tuple in relvar SCT:

Student S1 is enrolled in course C1 and

Course C1 is tutored by tutor T1 and

Tutor T1 tutors student S1 and

Tutor T1 tutors student S1 on course C1.

Even if relvar SC is dropped, however, I claim there’s still redundancy! For example, that same proposition Student S1 is enrolled in course C1 is represented as one of the conjuncts in the foregoing compound proposition and as one of the conjuncts in the following (also compound) proposition:

Student S1 is enrolled in course C1 and

Course C1 is tutored by tutor T2 and

Tutor T2 tutors student S1 and

Tutor T2 tutors student S1 on course C1.

Both of these compound propositions are represented by explicit tuples in relvar SCT. Thus, although my earlier characterization of the redundancies in this example might perhaps have been slightly misleading, I claim that redundancy does nevertheless exist. What’s more, if you agree with this position, I think you also have to agree that the use of either surrogates (see the discussion of the current example, Example 9, earlier in the chapter) or relation valued attributes (see the discussion of Example 10, also earlier in the chapter) makes no difference! That is, it’s still the case, with both surrogates and RVAs, that certain propositions are represented more than once, in general. Now, I admit this latter claim on my part might be somewhat open to debate. However, if you don’t agree with it, then I think you need to justify your position rather carefully, and in particular I think you need to come up with a replacement for—in fact, an improvement on—my proposed “final” definition of redundancy.

As a kind of appendix to all of the above, let me add that I believe a similar analysis applies to certain other examples from earlier in the book. For example, consider relvar CTXD from Chapter 9 and Chapter 12, with its attributes CNO, TNO, XNO, and DAYS. When I first introduced that example, I said the predicate was Teacher TNO spends DAYS days with textbook XNO on course CNO. But it would be more accurate to say it’s as follows:

Course CNO can be taught by teacher TNO and

Course CNO uses textbook XNO and

Teacher TNO spends DAYS days with textbook XNO on course CNO.

Similarly, consider relvar SPJ from Chapter 9 and Chapter 10, with its attributes SNO, PNO, and JNO. When I first introduced that example, I said the predicate was Supplier SNO supplies part PNO to project JNO. However, I think it would be more accurate to say it’s as follows:

Supplier SNO supplies part PNO and

Part PNO is supplied to project JNO and

Project JNO is supplied by supplier SNO and

Supplier SNO supplies part PNO to project JNO.

To continue with this last example just a moment longer: As we know from Chapter 10, SPJ isn’t in 5NF, and the recommendation is therefore to decompose it into its “projection” relvars SP, PJ, and JS, with headings {SNO,PNO}, {PNO,JNO}, and {JNO,SNO}, respectively. But what are the predicates for these three relvars? The answer depends on what I referred to in Chapter 9 (in a footnote) as the full semantics of the situation. Consider relvar SP, for example. If it’s possible for a tuple (s,p) to appear in SP without a tuple for supplier s appearing in JS or without a tuple for part p appearing in PJ, then the predicate for SP is simply Supplier SNO supplies part PNO. But if the appearance of tuple (s,p) in SP means there must be both a tuple for supplier s in JS and a tuple for part p in PJ, then the predicate for SP is Supplier SNO supplies part PNO to some (unspecified) project JNO. Since the second of these possibilities is more constraining than the first, it seems to me it would be prudent to assume the first interpretation.

Example 10

See the discussion of Example 9 above.

Example 11

No further discussion provided.

Example 12

Let C1 be a customer and let the sum of payments for customer C1 be (say) $10,000. Then this very proposition—The sum of payments for customer C1 is $10,000—is represented explicitly by the appearance of a tuple for customer C1 in relvar TOTALS and implicitly by the appearance of the set of tuples for that same customer in relvar PAYMENTS.



[169] In connection with the lack of quantification in the predicate for CT in particular, you might want to take another look at the section NORMALIZATION SERVES TWO PURPOSES in Chapter 3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset