SQL and Relational Theory

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10

SQL and Logic

Logic takes care of itself;
all we have to do is look and see how it does it.

—Ludwig Wittgenstein: Tractatus Logico-Philosophicus (1922)

As I mentioned in Chapter 1, there’s an alternative to the relational algebra called the relational calculus. What this means is that queries, constraints, view definitions, and so forth can all be formulated in calculus terms as well as algebraic ones; sometimes, in fact, it’s easier to come up with a calculus formulation than an algebraic one, though the opposite can also be true.

What is the relational calculus? Essentially, it’s an applied form of predicate calculus (also known as predicate logic), tailored to the needs of relational databases. So the aims of this chapter are to introduce the relevant features of predicate logic (hereinafter abbreviated to just logic); to show how those features are realized in concrete form in the relational calculus; and, of course, to consider the relevant features of SQL as we go.

Incidentally, it follows from the above that a relational language can be based on either the algebra or the calculus. For example, Tutorial D is based on the algebra (which is why there aren’t many references to Tutorial D in this chapter), and Query-By-Example and QUEL (see Appendix G) are both based on the calculus. So which is SQL based on? The answer, regrettably, is partly both and partly neither ... When it was first designed, SQL was specifically intended to be different from both the algebra and the calculus (the latter explicitly, the former perhaps a little less so); indeed, such a goal was the prime motivation for the introduction of the SQL “IN subquery” construct.¹ As time went on, however, it turned out that certain features of both the algebra and the calculus were needed after all, and the language grew to accommodate them. The consequence is that today some aspects of SQL are “algebra like,” some are “calculus like,” and some are neither—with the further consequence that, as I mentioned in passing in Chapter 6, most queries, constraints, and so on that can be expressed in SQL at all can in fact be expressed in numerous different ways.

Aside: The goal mentioned in the previous paragraph—the goal, that is, of making SQL different from both the algebra and the calculus—was based on what I regard as a fundamental misconception: namely, the idea that the algebra and calculus were both somewhat “user hostile.” But that perception, I believe, derived from a confusion over syntax vs. semantics. Certainly the syntax in Codd’s early papers was a little daunting, based as it was on formal mathematical notation. But semantics is another matter; the algebra and the calculus both have (I would argue) very simple semantics, and it’s fairly easy, as numerous writers and languages have demonstrated, to wrap that semantics up in syntax that’s very user friendly indeed. End of aside.

WHY DO WE NEED LOGIC?

Logic is useful because (among other things) it’s a great aid to clear and precise thinking. By contrast, everyone would surely agree that natural language is often vague and ambiguous. The following piece by Robert Graves and Alan Hodge illustrates the point delightfully:²

From the Minutes of a Borough Council Meeting:

Councillor Trafford took exception to the proposed notice at the entrance of South Park: “No dogs must be brought to this Park except on a lead.” He pointed out that this order would not prevent an owner from releasing his pets, or pet, from a lead when once safely inside the Park.

The Chairman (Colonel Vine): What alternative wording would you propose, Councillor?

Councillor Trafford: “Dogs are not allowed in this Park without leads.”

Councillor Hogg: Mr. Chairman, I object. The order should be addressed to the owners, not to the dogs.

Councillor Trafford: That is a nice point. Very well then: “Owners of dogs are not allowed in this Park unless they keep them on leads.”

Councillor Hogg: Mr. Chairman, I object. Strictly speaking, this would keep me as a dog-owner from leaving my dog in the back-garden at home and walking with Mrs. Hogg across the Park.

Councillor Trafford: Mr. Chairman, I suggest that our legalistic friend be asked to redraft the notice himself.

Councillor Hogg: Mr. Chairman, since Councillor Trafford finds it so difficult to improve on my original wording, I accept. “Nobody without his dog on a lead is allowed in this Park.”

Councillor Trafford: Mr. Chairman, I object. Strictly speaking, this notice would prevent me, as a citizen, who owns no dog, from walking in the Park without first acquiring one.

Councillor Hogg (with some warmth): Very simply, then: “Dogs must be led in this Park.”

Councillor Trafford: Mr. Chairman, I object: This reads as if it were a general injunction to the Borough to lead their dogs into the Park.

Councillor Hogg interposed a remark for which he was called to order; upon his withdrawing it, it was directed to be expunged from the Minutes.

The Chairman: Councillor Trafford, Councillor Hogg has had three tries; you have had only two ...

Councillor Trafford: “All dogs must be kept on leads in this Park.”

The Chairman: I see Councillor Hogg rising quite rightly to raise another objection. May I anticipate him with another amendment: “All dogs in this Park must be kept on the lead.”

This draft was put to the vote and carried unanimously, with two abstentions.

Note: I can’t resist pointing out that the final draft is still ambiguous—it could logically be interpreted to mean that all dogs in the park must be kept on the same (i.e., the one and only) lead. But enough of dogs ... Let’s move on.

SIMPLE AND COMPOUND PROPOSITIONS

Recall from Chapter 5 that, in logic, a proposition is something that evaluates unequivocally to either TRUE or FALSE. Here are some examples:

2 + 3 = 5
2 + 3 > 7
Jupiter is a star
Mars has two moons
Venus is between Earth and Mercury

Of these, Nos. 1, 4, and 5 are true and Nos. 2 and 3 are false—though in the case of No. 5 we do need to be rather careful over what exactly we mean by “between”! (To be a little more precise about the matter, what I mean by it is this: If we denote the average distances of Mercury, Venus, and Earth from the sun by m, v, and e, respectively, then m < v < e.) Be that as it may, a good informal test for whether something, p say, is a valid proposition is to ask whether “Is it true that p?” is a sensible question. For example, “Is it true that 2 + 3 > 7?” is certainly a sensible question, even though the answer is no. To check your understanding of this point, which of the following do you think are legal propositions? (You might want to check the answers at the end of the chapter before reading further.)

Bach is the greatest musician who ever lived.
What’s the time?
Supplier S2 is located in some city, x.
Some countries have a female president.
All politicians are corrupt.
Supplier S1 is located in Paris.
We both have the same favorite author, x.
Nothing is heavier than lead.
It will rain tomorrow.
Supplier S6’s city is unknown.

By the way, there’s an important point of detail here (which I’m mostly going to ignore—I mention it only to head off at the pass, as it were, certain criticisms that persons trained in formal logic might be tempted to level at this chapter). The truth is, a proposition isn’t really a declarative sentence as such; rather, it’s the assertion made by that sentence. For example, “It’s hot” and “Il fait chaud” are clearly distinct sentences, but they both assert the same proposition. That said, I’ll continue to assume from this point forward for simplicity that a proposition is indeed just a declarative sentence. Analogous remarks apply to predicates also (see later).

Connectives

Given some set of propositions, we can combine propositions from that set to form further propositions, using various connectives. The connectives most commonly encountered in practice are NOT, AND, OR, IF ... THEN ... (also known as IMPLIES, sometimes written “⇒”), and IF AND ONLY IF (also known as IFF, or BI-IMPLIES, or IS EQUIVALENT TO, and sometimes written “⇔” or “≡”). Here are a few examples of propositions that can be formed from Nos. 3, 4, and 5 from the foregoing list:

( Jupiter is a star ) OR ( Mars has two moons )
( Jupiter is a star ) AND ( Jupiter is a star )
( Venus is between Earth and Mercury AND NOT ( Jupiter is a star )
IF ( Mars has two moons ) THEN ( Venus is between Earth and Mercury )
IF ( Jupiter is a star ) THEN ( Mars has two moons )

I’ve introduced some parentheses to make the scope of the connectives clear in these examples; in practice, we adopt certain precedence rules that allow us to omit many of the parentheses that might otherwise be required. Of course, it’s never wrong to include them, even when they’re logically unnecessary, and sometimes they can improve clarity.

In general, the connectives can be regarded as logical operators—they take one or more propositions as input and return another proposition as output. NOT is a monadic operator, the other four are dyadic. A proposition that involves no connectives is called a simple proposition; a proposition that isn’t simple is called compound, or composite. And the truth value of a compound proposition can be determined from the truth values of its constituent simple propositions in accordance with the following truth tables (in which, for space reasons, I’ve abbreviated TRUE and FALSE to just T and F, respectively):

p  │ NOT p       p q │ p AND q │ p OR q │ IF p THEN q │ p IFF q
───┼───────     ─────┼─────────┼────────┼─────────────┼─────────
T  │   F         T T │    T    │   T    │       T     │    T
F  │   T         T F │    F    │   T    │       F     │    F
                 F T │    F    │   T    │       T     │    F
                 F F │    F    │   F    │       T     │    T

By the way, truth tables can also be drawn in the following slightly different style (and here I’ve abbreviated IF ... THEN ... to just IF, again for space reasons):

NOT│       AND│ T F     OR │ T F     IF │ T F     IFF │ T F
───┼───    ───┼─────    ───┼─────    ───┼─────    ────┼─────
T │ F      T │ T F      T │ T T      T │ T F      T  │ T F
F │ T      F │ F F      F │ T F      F │ T T      F  │ F T

Neither style is more correct than the other; it’s just that sometimes one is more convenient, sometimes the other is. Anyway, let’s take a closer look at one of the foregoing compound propositions (number 9, to be specific). Here it is again:

IF ( Mars has two moons ) THEN ( Venus is between Earth and Mercury )

This proposition is of the form IF p THEN q (equivalently, p IMPLIES q), where p is the antecedent and q is the consequent. Since the antecedent and the consequent both evaluate to TRUE, the overall proposition evaluates to TRUE also, as you can see from the truth table. But whether Venus is between Earth and Mercury obviously has nothing to do with whether Mars has two moons! So what exactly is going on here?

The foregoing example highlights a problem that people with no training in formal logic often experience: namely, that logical implication is notoriously difficult to come to grips with. So I’d like to offer the following argument, or rationale, in an attempt to clarify the matter.

First of all, observe that there are exactly 16 dyadic connectives altogether, corresponding to the 16 possible dyadic truth tables (just four of which are shown above). Note: Exercise 10.1 asks you to draw all of those truth tables, and it might be worth having a go at that exercise right now.
Of those 16 dyadic connectives, some but not all are given common names such as AND and OR. But those names are really nothing more than a mnemonic device; they don’t have any intrinsic meaning, they’re chosen simply because the connectives so named have behavior that’s similar (not necessarily identical) to that of their natural language counterparts. Indeed, it’s easy to see that even AND doesn’t mean quite the same thing as “and” in natural language. In logic, “p AND q” and “q AND p” are equivalent—but their natural language counterparts might not be. Here’s an illustration: The natural language statements

“I was seriously disappointed and I voted for a change in leadership”

and

“I voted for a change in leadership and I was seriously disappointed”

are most certainly not equivalent! In other words, AND is a kind of logical distillate of “and” in natural language; very importantly—and unlike “and” in natural language—its meaning is context independent. Similar remarks apply to all of the other connectives.

Aside: The foregoing example (concerning AND) is perhaps a little misleading, in that it could be argued that “I was seriously disappointed” means different things in the two statements quoted. In the first, it means “I was seriously disappointed in the status quo”; in the second, it means “I was seriously disappointed in the the outcome of the vote.” If this analysis is correct, it would be strictly incorrect to symbolize the two statements as p AND q and q AND p, respectively; although the two q’s are the same, the two p’s aren’t. But the example is valuable nevertheless, in that it does at least show—not for the first time, perhaps—that we have to be rather careful in mapping natural language utterances to their symbolic logic counterparts. End of aside.
Of the 16 available dyadic connectives, the one called IMPLIES has behavior that most closely resembles that of implication as understood in natural language. For example, “if Mars has two moons, then Mars has at least one moon” is a valid implication, both in logic and in natural language. But nobody would or should claim that logical implication and natural language implication are the same thing. In fact, logical implication, like all of the connectives, is (of necessity) formally defined—i.e., it’s defined by means of a truth table purely in terms of the truth values, not the meanings, of its operands—whereas the same obviously can’t be said of its natural language counterpart.
Let’s look at another example (number 10 from the foregoing list):

IF ( Jupiter is a star ) THEN ( Mars has two moons )

Perhaps even more counterintuitively, this one evaluates to TRUE also (check the truth table), because the antecedent is false; yet whether Mars has two moons clearly has nothing to do with whether Jupiter is a star. Again, part of the justification—for the fact that the implication evaluates to TRUE, that is—is just that IMPLIES is formally defined. In this case, however, there’s another argument (a database example, in fact) that you might find a little more satisfying. Suppose the suppliers-and-parts database is subject to the constraint that red parts must be stored in London (I deliberately state that constraint here in somewhat simplified form):

IF ( COLOR = 'Red' ) THEN ( CITY = 'London' )

Clearly we don’t want this constraint to be violated by a part that isn’t red. It follows, therefore, that we want the proposition overall (which is a logical implication) to evaluate to TRUE if the antecedent evaluates to FALSE.

It follows from all of the above that the proposition p IMPLIES q (equivalently, IF p THEN q) is logically equivalent to the proposition (NOT p) OR q—it evaluates to FALSE if and only if p evaluates to TRUE and q to FALSE, as you can see from the truth table. And, just incidentally, this equivalence serves to illustrate the point that the connectives NOT, AND, OR, IMPLIES, and IF AND ONLY IF aren’t all primitive, since some of them can be expressed in terms of others. As a matter of fact, all possible monadic and dyadic connectives can be expressed in terms of suitable combinations of NOT and either AND or OR.³ (Exercise: Check this claim.) Perhaps even more remarkably, all such connectives can in fact be expressed in terms of just one primitive. Can you find it?

A Remark on Commutativity

The connectives AND and OR are commutative; that is, the compound propositions p AND q and q AND p are logically equivalent, and so are the compound propositions p OR q and q OR p. As a consequence, you should never write code involving such propositions that assumes that p will be evaluated before q or the other way around. For example, let the function SQRT (“nonnegative square root”) be defined in such a way that an exception is raised if its argument is negative, and consider the following SQL expression:

SELECT ...
FROM ...
WHERE X >= 0 AND SQRT ( X ) <= 100 ...

This expression isn’t guaranteed to avoid raising the exception, because the SQRT function might be invoked before the test is done to ensure that X is nonnegative.

Contrapositives

Consider again the database constraint discussed above in connection with logical implication:

IF ( COLOR = 'Red' ) THEN ( CITY = 'London' )

Let me now point out that this expression is logically equivalent to the following one:

IF NOT ( CITY = 'London' ) THEN NOT ( COLOR = 'Red' )

This latter expression is the contrapositive of the previous one. In general, in fact, we have the following equivalence:

IF p THEN q ≡ IF NOT q THEN NOT p

As a matter of fact we’ve seen several examples of contrapositives in this book already. For example:

Chapter 6 stated, in connection with certain operators such as join, that “attributes with the same name must be of the same type; i.e., attributes of different types must have different names” (slightly paraphrased). Each half of this statement is the contrapositive of the other.
The discussion of constraint CX1 in Chapter 8 (effectively, though perhaps a little tortuously) relied on the fact that each of the following expressions—

IF STATUS < 1 OR STATUS > 100 THEN FALSE

and

IF TRUE THEN STATUS ≤ 1 AND STATUS ≥ 100

—is the contrapositive of the other.
Chapter 8 also stated that “correct implies consistent (but not the other way around), and inconsistent implies incorrect (but not the other way around).” Again, each half of this statement is the contrapositive of the other.

Note that it’s strictly expressions of the form IF p THEN q—that is, logical implications— to which the contrapositive notion applies. Be that as it may, here’s a question for you: How many of the following expressions are logically distinct?

IF ( WEIGHT > 17.0 ) THEN ( CITY ≠ 'Paris' )
IF ( CITY = 'Paris' ) THEN ( WEIGHT ≤ 17.0 )
( WEIGHT ≤ 17.0 ) OR ( CITY ≠ 'Paris' )
NOT ( ( CITY = 'Paris' ) AND ( WEIGHT > 17.0 ) )

Well, I hope you can see that all four of these expressions in fact say the same thing. However, I think you’ll agree also that this fact isn’t immediately obvious! Let’s take a closer look. Let’s use p and q to denote the subexpressions (WEIGHT > 17.0) and (CITY ≠ 'Paris'), respectively. The four expressions become:

IF p THEN q
IF NOT q THEN NOT p
NOT p OR q
NOT ( ( NOT q ) AND p )

Now I think it’s a little easier to see that the four are indeed all equivalent to one another.⁴ So the example demonstrates two things: First (to repeat), the equivalences aren’t always obvious; second, introducing symbols like p and q allows us to manipulate the expressions in a purely formal manner and makes it easier to see what’s really going on (easier to see the forest as well as the trees, one might say). I’ll have more to say about such matters in the next chapter.

SIMPLE AND COMPOUND PREDICATES

Consider the following statements:⁵

x is a star
x has two moons
x has m moons
x is between Earth and y
x is between y and z

Here x, y, z, and m are parameters or placeholders. As a consequence, the statements aren’t propositions (i.e., they aren’t unequivocally either true or false), precisely because they do involve such parameters. For example, the statement “x is a star” involves the parameter x, and we can’t say whether it’s true or false until we’re told what that x stands for—at which point we’re no longer dealing with the given statement anyway but a different one instead, as the paragraph immediately following makes clear.

Now, we can substitute arguments for those parameters and thereby obtain propositions from those parameterized statements. For example, if we substitute the argument the sun for the parameter x in “x is a star,” we obtain “the sun is a star.” And this statement is indeed a proposition, because it’s unequivocally either true or false (in fact, of course, it’s true). But the original statement as such (“x is a star”) is, to say it again, not itself a proposition. Rather, it’s a predicate, which—as you’ll recall from Chapter 5—is a truth valued function; that is to say, it’s a function that, when invoked, returns a truth value. Like all functions, a predicate has a set of parameters; when the predicate is invoked, arguments are substituted for the parameters; substituting arguments for the parameters effectively converts the predicate into a proposition; and we say the arguments satisfy the predicate if and only if that proposition is true. For example, the argument the sun satisfies the predicate “x is a star,” while the argument the moon doesn’t.

Note: Recall from Chapter 5 that logicians speak not of invoking a predicate but rather of instantiating it. (As a matter of fact, for reasons that needn’t concern us here, their concept of instantiation is slightly more general than that of the familiar notion of function invocation.) However, I’ll favor the terminology of invocation in this chapter. Also, Exercise 5.18 in Chapter 5 showed that a proposition can be regarded as a degenerate predicate; to be precise, it’s a predicate for which the set of parameters is empty (and the truth valued function that’s that predicate thus always returns the same result, either TRUE or FALSE, every time it’s invoked). In other words, all propositions are predicates, but “most” predicates aren’t propositions.

Now consider the predicate “x has m moons.” This example differs from the previous one (“x is a star”) in that it involves two parameters, x and m. (By way of example, substituting the arguments Mars for x and 2 for m yields a true proposition; substituting the arguments Earth for x and 2 for m yields a false one.) In fact, predicates can conveniently be classified according to the cardinality of their set of parameters. Thus we speak of an n-place predicate, meaning a predicate with exactly n parameters; for example, “x is between y and z” is a 3-place predicate, while “x has m moons” is a 2-place predicate. A proposition is a 0-place predicate. Note: An n-place predicate is also called an n-adic predicate. If n = 1, the predicate is monadic; if n = 2, it’s dyadic. And a proposition is a niladic predicate.

Next, given a set of predicates, we can combine predicates from that set to form further predicates using the logical connectives already discussed (NOT, AND, OR, and so forth); in other words, the connectives are logical operators that operate on predicates in general, not just on the special predicates that happen to be propositions. A predicate that involves no connectives is called simple; a predicate that isn’t simple is called compound, or composite. Here’s an example of a compound predicate:

( x is a star ) OR ( x is between Earth and y )

This predicate is dyadic—not because it involves two simple predicates, but because it involves two parameters, x and y (one of which is referenced twice and the other once only).

Rules of Inference

It’s a bit of a digression from my main purpose in this chapter, but as an aside I can now give a (somewhat loose) definition of predicate logic. Logic in general can be defined as the science, or scientific study, of the methods and principles used in valid reasoning. And predicate logic in particular can be defined as a formal system involving predicates and connectives and the inferences that can be made using such predicates and connectives. Observe, therefore, that predicate logic involves certain rules of inference—i.e., rules by which additional truths can be derived (or proved) from established truths. The additional truths are called theorems, and the established truths are either axioms or theorems that have previously been proved.

One important inference rule is called modus ponens: If we know that p is true, and if we also know that IF p THEN q is true, then we can infer that q is true. For example, given the truth of both “I have no money” and “If I have no money, then I will have to wash dishes,” we can infer the truth of “I will have to wash dishes.”

Another important inference rule is modus tollen: If we know that IF p THEN q is true, but we also know that q is false, then we can infer that p is false. This rule is relevant to the process of database integrity checking. Conceptually, what happens is this: When an update is requested, the proposed new database value is checked against known integrity constraints; if the proposition expressed by some constraint—see Chapter 8—now evaluates to FALSE, that proposed new value must also represent falsehood, and so the update must be rejected.

QUANTIFICATION

I showed in the previous section that one way to get a proposition from a predicate is to invoke it with an appropriate set of arguments (a process known to logicians as instantiation). But there’s another way, too, and that’s by means of quantification. Let p(x) be a monadic predicate (I show the single parameter x explicitly for clarity). Then:

The expression

EXISTS x ( p ( x ) )

is a proposition, and it means: “There exists at least one possible argument value a that can be substituted for the parameter x such that p(a) evaluates to TRUE” (in other words, the argument value a satisfies predicate p). For example, if p is the predicate “x is a logician,” then

EXISTS x ( x is a logician )

is a proposition—one that evaluates to TRUE, as it happens (for example, take a to be Bertrand Russell).
The expression

FORALL x ( p ( x ) )

is also a proposition, and it means: “All possible argument values a that can be substituted for the parameter x are such that p(a) evaluates to TRUE” (in other words, all such argument values a satisfy predicate p). For example, if again p is the predicate “x is a logician,” then

FORALL x ( x is a logician )

is a proposition—one that evaluates to FALSE, as it happens (for example, take a to be George W. Bush).

Observe that it’s sufficient to produce a single example to show the truth of the EXISTS proposition and a single counterexample to show the falsity of the FORALL proposition. Observe too in both cases that the parameter must be constrained to “range over” some set of permissible values (the set of all persons, in the examples). I’ll come back to this latter point in the section “Relational Calculus,” later.

The term used in logic for constructs like EXISTS x and FORALL x is quantifiers (the term derives from the verb to quantify, which simply means to express as a quantity—that is, to say how much of something there is or how many somethings there are). Quantifiers of the form EXISTS ... are said to be existential; quantifiers of the form FORALL ... are said to be universal. And in logic texts, EXISTS is usually represented by a backward E (“∃”) and FORALL by an upside down A (“∀”). I use the keywords EXISTS and FORALL here for readability.

Aside: At this point, one reviewer asked whether a quantifier is just another connective. No, it isn’t. Let p(x) and q(x) be predicates, each with a single parameter x. Then p(x) and q(x) can be combined in various ways by means of connectives (as in, e.g., p(x) AND q(x)), but the result is always just another predicate with that same single parameter x. By contrast, quantifying over x—that is, forming an expression such as EXISTS x (p(x)) or FORALL x (q(x)), or even EXISTS x (p(x) AND q(x))—has the effect of converting the predicate concerned into something else (viz., a proposition). Thus, there’s a clear logical difference between the two concepts. (Though I should add that, at least in the database context, the quantifiers can in fact be defined in terms of certain connectives. I’ll explain this point later, in the section “More on Quantification.”) End of aside.

By way of another example, consider the dyadic predicate “x is taller than y.” If we quantify existentially over x, we obtain:

EXISTS x ( x is taller than y )

This statement isn’t a proposition, because it isn’t unequivocally either true or false; in fact, it’s a monadic predicate—it has a single parameter, y. Suppose we invoke this predicate with argument Steve. We obtain:

EXISTS x ( x is taller than Steve )

This statement is a proposition (and if there exists at least one person—Arnold, say—who’s taller than Steve, then it evaluates to TRUE). But another way to obtain a proposition from the original predicate is to quantify over both parameters. For example:

EXISTS x ( EXISTS y ( x is taller than y ) )

This statement is indeed a proposition; it evaluates to FALSE only if nobody is taller than anybody and to TRUE otherwise (think about it!).

There are several lessons to be learned from this simple example:

To obtain a proposition from an n-adic predicate by quantification alone, it’s necessary to quantify over every parameter. More generally, if we quantify over m parameters (m ≤ n), we obtain a k-adic predicate, where k = n – m.
Let’s focus on existential quantification only for the moment. Then there are apparently two different propositions we can obtain in the example by “quantifying over everything”:

EXISTS x ( EXISTS y ( x is taller than y ) )
EXISTS y ( EXISTS x ( x is taller than y ) )

However, I hope you can see these two propositions both say the same thing: “There exist two persons x and y such that x is taller than y.” More generally, in fact, it’s easy to see that a series of like quantifiers—all existential or all universal—can be written in any sequence we choose without changing the overall meaning. By contrast, with unlike quantifiers, the sequence matters (see the bullet item immediately following).
When we “quantify over everything,” each individual quantifier can be either existential or universal. In the example, therefore, there are six distinct propositions that can be obtained by fully quantifying, and I’ve listed them below. (Actually there are eight, but two of them can be ignored by virtue of the previous bullet item.) I’ve also shown a precise natural language interpretation in each case. Note that those interpretations are all logically different!—in particular, some of them evaluate to TRUE and some to FALSE. Please note, however, that I’ve had to assume in connection with certain of those evaluations that there does exist at least one person “in the universe,” as it were. I’ll come back to this assumption in the section “More on Quantification,” later.

EXISTS x ( EXISTS y ( x is taller than y ) )

Meaning: Somebody is taller than somebody; TRUE, unless everybody is the same height.

EXISTS x ( FORALL y ( x is taller than y ) )

Meaning: Somebody is taller than everybody (that particular somebody included!); clearly FALSE.

FORALL x ( EXISTS y ( x is taller than y ) )

Meaning: Everybody is taller than somebody; clearly FALSE.

EXISTS y ( FORALL x ( x is taller than y ) )

Meaning: Somebody is shorter than everybody (that particular somebody included); clearly FALSE. Note: Actually I’m cheating a little bit here, because I haven’t said what I mean by “shorter.” But I could have done—i.e., I could have stated explicitly, somehow, that the predicates “x is taller than y” and “y is shorter than x” are logically equivalent—and for present purposes I’ll assume I’ve done so.

FORALL y ( EXISTS x ( x is taller than y ) )

Meaning: Everybody is shorter than somebody; clearly FALSE.

FORALL x ( FORALL y ( x is taller than y ) )

Meaning: Everybody is taller than everybody; clearly FALSE.

Last (I apologize for the repetition, but the point is important): Even though five out of six of the foregoing propositions do all evaluate to the same truth value, FALSE, it doesn’t follow that they all mean the same thing, and indeed they don’t; in fact, no two of them do.

Free and Bound Variables

What I’ve so far been calling parameters are more usually known in logic as free variables—and quantifying over a free variable, using either EXISTS or FORALL, converts that free variable into what’s called a bound variable. For example, consider again the 2-place predicate from the previous section:

x is taller than y

Here x and y are free variables. If we now quantify existentially over x,⁶ we obtain:

EXISTS x ( x is taller than y )

Now y is free (still) but x is bound. And if we now quantify existentially over y as well, we obtain:

EXISTS x EXISTS y ( x is taller than y )

Now x and y are both bound, and there are no free variables at all (the predicate has degenerated to a proposition).

Now, we already know that free variables correspond to parameters, in conventional programming terms. Bound variables, by contrast, don’t have an exact counterpart in conventional programming; instead, they’re just a kind of dummy—they serve only to link the predicate inside the parentheses to the quantifier outside. For example, consider the simple predicate (actually a proposition):

EXISTS x ( x > 3 )

This proposition merely asserts that there exists some integer greater than three. (I’m assuming here that the variable x is constrained to “range over” the set of integers. Again, I’ll come back to this question of “ranges” later.) Note, therefore, that the meaning of the proposition would remain totally unchanged if the two x’s were both replaced by some other variable y. In other words, the proposition

EXISTS y ( y > 3 )

is semantically and logically identical to the one just shown.

Now consider the predicate:

EXISTS x ( x > 3 ) AND x < 0

Here there are three x’s—but they don’t all mean the same thing. The first two are bound, and can be replaced by (say) y without changing the overall meaning; but the third is free and can’t be replaced with impunity. Thus, of the following two predicates, the first is equivalent to the one just shown and the second isn’t:

EXISTS y ( y > 3 ) AND x < 0
EXISTS y ( y > 3 ) AND y < 0

As this example demonstrates, the terminology of free vs. bound “variables” doesn’t really refer to variables per se, but rather to variable occurrences—occurrences of references to variables within some predicate, to be precise. In the predicate EXISTS y (y > 3) AND y < 0, for example, it’s the first two occurrences of the reference to y that are bound, and the third such occurrence that’s free. Despite this state of affairs, it’s usual (perhaps regrettably) to talk about free and bound variables as such,⁷ even though such talk is really quite sloppy. Be on your guard for confusion in this area!

To close this section, I remark that we can now (re)define a proposition to be a predicate in which all of the variables are bound: equivalently, one that involves no free variables.

RELATIONAL CALCULUS

Essentially everything I’ve discussed in this chapter so far maps very directly into the relational calculus. Let’s look at a simple example—a relational calculus representation of the query “Get supplier number and status for suppliers in Paris who supply part P2.” Here first for comparison purposes is an algebraic formulation:

( S WHERE CITY = 'Paris' ) { SNO , STATUS }
MATCHING ( SP WHERE PNO = 'P2' )

And here’s a relational calculus equivalent:

RANGEVAR SX RANGES OVER S ;
RANGEVAR SPX RANGES OVER SP ;

{ SX.SNO , SX.STATUS }
WHERE SX.CITY = 'Paris' AND
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = 'P2' )

Explanation:

The first two lines are definitions, defining SX and SPX to be range variables that range over S and SP, respectively. What those definitions mean is that, at any given time, permitted values of SX are tuples in the relation that’s the value of relvar S at that time; likewise, permitted values of SPX are tuples in the relation that’s the value of relvar SP at that time.
The remaining lines are the actual query. Observe that they take the following generic form:

proto tuple WHERE predicate

This expression overall is the relational calculus version of a relational expression (i.e., an expression that denotes a relation), and it evaluates to a relation containing all and only those possible values of the proto tuple that satisfy the predicate (i.e., make it evaluate to TRUE). In the example, therefore, the result is a relation of degree two, containing every (SNO,STATUS) pair from relvar S such that (a) the corresponding city is Paris and (b) there exists a shipment in relvar SP with the same supplier number as the one in that (SNO,STATUS) pair and with part number P2.⁸

Note the use of dot qualified names in this example (in both the proto tuple and the predicate). I won’t go into details, however, because dot qualified names will be familiar to you from SQL, I’m sure. Indeed, SQL has a formulation of the query under discussion that’s very similar in general terms to the foregoing relational calculus formulation:

SELECT SX.SNO , SX.STATUS
FROM   S AS SX
WHERE  SX.CITY = 'Paris'
AND    EXISTS
     ( SELECT *
       FROM   SP AS SPX
       WHERE  SPX.SNO = SX.SNO
       AND    SPX.PNO = 'P2' )

As this example indicates:

First, SQL does support range variables, though it doesn’t usually refer to them by that name:⁹ The specifications S AS SX and SP AS SPX serve to define such variables, and those variables are then explicitly referenced elsewhere in the overall expression by means of dot qualified names such as SX.SNO, SPX.SO, and so on. Note: In practice, such AS specifications and such explicit range variable references are often omitted, at least in simple queries. I’ll explain exactly how such omissions are possible in Chapter 12, when I discuss range variables in SQL in more detail.
Second, and perhaps more important, SQL also supports EXISTS. However, that support is somewhat indirect. To be specific, let sq be a subquery; then EXISTS sq is a boolean expression (and so represents a predicate), and it evaluates to FALSE if the table denoted by sq is empty and TRUE otherwise.¹⁰ Note: The table expression tx in parentheses that constitutes sq will usually, though not invariably, be of the form SELECT * FROM ... WHERE ..., and the WHERE clause will usually, though not invariably, include some reference to some “outer” table, meaning sq will typically be a correlated subquery specifically. In the foregoing example, S is that outer table, and it’s referenced by means of the range variable SX. Again, see Chapter 12 for further explanation.

Aside: There’s a certain irony here, though. As we saw in Chapter 4, SQL, because it supports nulls, is based on what’s called three-valued logic, 3VL (instead of the conventional two-valued logic I’m discussing in this chapter, which is what the relational model is based on). In 3VL, the existential quantifier can return three different results: TRUE, FALSE, and UNKNOWN (where UNKNOWN is “the third truth value”; again, see Chapter 4). But SQL’s EXISTS operator always returns TRUE or FALSE, never UNKNOWN. For example, EXISTS (tx) will return TRUE, not UNKNOWN, if tx evaluates to a table containing nothing but nulls (I’m speaking a trifle loosely here); yet UNKNOWN is the logically correct result.¹¹ As a consequence, (a) SQL’s EXISTS isn’t a faithful implementation of the existential quantifier of 3VL, and (b) once again, therefore, SQL queries sometimes return the wrong answer. Example 3 in the next chapter is a case in point. End of aside.

Let’s look at another example. Consider the query “Get names of suppliers who supply all parts.” I’ll assume we have the same range variables SX and SPX as before, but I’ll also define another one (PX) ranging over P:

RANGEVAR PX RANGES OVER P ;

{ SX.SNAME } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

In somewhat stilted natural language: “Get names of suppliers such that, for all parts, there exists a shipment with the same supplier number as the supplier and the same part number as the part.” Note: As you probably know, SQL has no direct support for FORALL. For that reason, I won’t show an SQL analog of this example here—I’ll come back to it later, in the section “More on Quantification.” I will point out, however, that there’s a logical difference between the foregoing calculus expression and this one, where the quantifiers have been switched:

{ SX.SNAME } WHERE EXISTS SPX ( FORALL PX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

Exercise: What does this latter expression mean? And do you think the query is a “sensible” one?

One more example (“Get names of suppliers who supply at least one red part”):

{ SX.SNAME } WHERE EXISTS PX ( PX.COLOR = 'Red' AND
EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

I’m assuming here that we have the same range variables available to us as we had in the earlier examples; in fact, I’ll continue to make that same assumption throughout the rest of the chapter.

By the way, here’s another possible formulation of the foregoing query:

{ SX.SNAME } WHERE EXISTS PX ( EXISTS SPX ( PX.COLOR = 'Red' AND
SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

In this latter formulation, the predicate in the WHERE clause is in what’s called “prenex normal form,” meaning, loosely, that the quantifiers all appear at the beginning. Here’s a precise definition of this concept (observe that the definition involves some recursion):

Definition: A predicate is in prenex normal form (PNF) if and only if (a) it’s quantifier free (i.e., it contains no quantifiers at all) or (b) it’s of the form EXISTS x (p) or FORALL x (p), where p is in PNF in turn. In other words, a PNF predicate takes the form

Q1 x1 ( Q2 x2 ( ... ( Qn xn ( q ) ) ... ) )

where (a) n ≥ 0; (b) each Qi (i = 1, 2, ..., n) is either EXISTS or FORALL; and (c) the predicate q—which is sometimes called the matrix—is quantifier free.

Prenex normal form isn’t more or less correct than any other form, but with a little practice it does tend to become the most natural formulation, and the easiest to write, in many cases (not all).

More on Range Variables

From what I’ve said in this section so far, it should be clear that range variables in the relational calculus serve as the free and bound variables that are required by formal logic. As I mentioned earlier, those variables always have to range over some set of permissible values; in the relational calculus context specifically, that set is always the body of some relation (usually but not necessarily the relation that’s the current value of some relvar). Note: It follows that a given range variable always denotes some tuple. For that reason, the relational calculus is sometimes known more specifically as the tuple calculus, and the variables themselves as tuple variables. This latter usage can be confusing, however, since the term tuple variable already has a somewhat different and more conventional meaning (see Chapter 2). For such reasons, I won’t adopt that usage in this book.¹²

Now I can say a little more about the syntax of relational calculus expressions:

First of all, a proto tuple is a commalist of items enclosed in braces, in which each item is either a range attribute reference—possibly with an associated AS specification to introduce a new attribute name—or a range variable reference. (There are other possibilities too, but I’ll limit my attention to just these cases until further notice. See Example 5 below.) Note: It’s usual to omit the braces if the commalist contains just a single item, but I’ll generally show them in my examples even when they’re not actually required, for clarity.
A range attribute reference is an expression of the form R.A, where A is an attribute of the relation that range variable R ranges over; SX.SNO is an example. A range variable reference is just a range variable name, like SX, and it’s shorthand for a commalist of range attribute references, one for each attribute of the relation the range variable ranges over.
Let some range attribute reference involving range variable R appear, explicitly or implicitly, within some proto tuple. Then the predicate in the corresponding WHERE clause can, and usually will, contain at least one free range attribute reference involving R—where by “free range attribute reference involving R” I mean a range attribute reference of the form R.A that’s not within the scope of any quantifier for which R is the bound variable.
The WHERE clause is optional; omitting it is equivalent to specifying WHERE TRUE.

More Sample Queries

I’ll give a few more examples of relational calculus queries, in order to illustrate a few more points or possibilities; however, I’m definitely not trying to be exhaustive in my treatment. For simplicity, I’ll omit the RANGEVAR definitions that would be needed in practice and will just assume that SX, SY, etc., have been defined as range variables over S; PX, PY, etc., have been defined as range variables over P; and SPX, SPY, etc., have been defined as range variables over SP. Please note that the formulations shown aren’t the only ones possible, in general. I’ll leave it as another exercise for you to show equivalent SQL formulations in each case.

Example 1: Get all pairs of supplier numbers such that the suppliers concerned are colocated.

{ SA := SX.SNO , SB := SY.SNO }
WHERE SX.CITY = SY.CITY AND SX.SNO < SY.SNO

Note the introduction of result attribute names SA and SB in this example. Incidentally, this example provides a good illustration of the point that some queries are more easily formulated in the calculus than they are in the algebra (if you recall, an algebraic formulation for this query, rather more complicated than the calculus formulation just shown, was given in the section “Formulating Expressions One Step at a Time” in Chapter 6).

Example 2: Get names of suppliers who supply at least one Paris part.

{ SX.SNAME } WHERE EXISTS SPX ( EXISTS PX ( SX.SNO = SPX.SNO AND
SPX.PNO = PX.PNO AND
PX.CITY = 'Paris' ) )

Example 3: Get names of suppliers who supply at least one part supplied by supplier S2.

{ SX.SNAME } WHERE EXISTS SPX ( EXISTS SPY ( SX.SNO = SPX.SNO AND
SPX.PNO = SPY.PNO AND
SPY.SNO = 'S2' ) )

Example 4: Get names of suppliers who don’t supply part P2.

{ SX.SNAME } WHERE NOT ( EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = 'P2' ) )

The outer parentheses in this example (i.e., the ones enclosing the expression following NOT) might not be needed in practice; indeed, I’ll often omit such parentheses in later examples.

Incidentally, the predicate in the foregoing formulation isn’t in prenex normal form, precisely because of that opening NOT. It would be possible to replace it by one that is, like this—

{ SX.SNAME } WHERE FORALL SPX ( SPX.SNO ≠ SX.SNO OR SPX.PNO ≠ 'P2' )

—but I don’t think this alternative formulation is as “natural” as the non PNF version; that is, I think this example illustrates the point that a PNF formulation isn’t always the one that comes most readily to mind.

Example 5: For each shipment, get full shipment details, including total shipment weight.

{ SPX , SHIPWT := PX.WEIGHT * SPX.QTY } WHERE PX.PNO = SPX.PNO

Note the use of a computational expression in the proto tuple here. An algebraic version of this example would involve EXTEND, and probably image relations also.

Example 6: For each part, get the part number and the total shipment quantity.

{ PX.PNO , TOTQ := SUM ( SPX WHERE SPX.PNO = PX.PNO , QTY ) }

This example illustrates the use of an aggregate operator invocation within the proto tuple (it’s also the first example to omit the WHERE clause). Incidentally, note that the following expression, though syntactically legal, would not be a correct formulation of the query (why not?):

{ PX.PNO , TOTQ := SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO ) }

Answer: Because duplicate quantities would be eliminated before the sum is computed.

Example 7: Get cities that store more than five red parts.

{ PX.CITY }
WHERE COUNT ( PY WHERE PY.CITY = PX.CITY AND PY.COLOR = 'Red' ) > 5

Sample Constraints

Now I’d like to give some examples of the use of relational calculus in formulating constraints. The first eight are based on, and use the same numbering as, the constraint examples in Chapter 8. I’ll assume the availability of range variables as in the previous subsection. Please note again that the formulations shown aren’t the only ones possible, in general.

Example 1: Status values must be in the range 1 to 100 inclusive.

CONSTRAINT CX1 FORALL SX ( SX.STATUS ≥ 1 AND SX.STATUS ≤ 100 ) ;

Note: SQL allows a constraint like this one to be simplified by (in effect) eliding both the explicit use of (a) the range variable and (b) more important, the explicit universal quantification. To be specific, we can specify a base table constraint—see Chapter 8—as part of the definition of base table S that looks like this:

CONSTRAINT CX1 CHECK ( STATUS >= 1 AND STATUS <= 100 )

Similar remarks apply to subsequent examples also.

Example 2: Suppliers in London must have status 20.

CONSTRAINT CX2 FORALL SX ( IF SX.CITY = 'London'
THEN SX.STATUS = 20 ) ;

Example 3: No two tuples in relvar S have the same supplier number (i.e., {SNO} is a key, or rather a superkey, for relvar S).

CONSTRAINT CX3 FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO THEN
                                          SX.SNAME = SY.SNAME AND
                                          SX.STATUS = SY.STATUS AND
                                          SX.CITY = SY.CITY ) ) ;

This formulation isn’t very elegant, to say the least! I’ll come back to this example and give a better formulation of it in the next section.

Example 4: Whenever two tuples in relvar S have the same supplier number, they also have the same city (in other words, the functional dependency {SNO} → {CITY} holds in relvar S).

CONSTRAINT CX4 FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO
THEN SX.CITY = SY.CITY ) ) ;

As noted in Chapter 8, this constraint is actually a logical consequence of the fact that {SNO} is a superkey for relvar S. If this latter constraint is stated, therefore, constraint CX4 needn’t be.

Example 5: No supplier with status less than 20 can supply part P6.

CONSTRAINT CX5 FORALL SX ( IF SX.STATUS < 20 THEN
NOT EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = 'P6' ) ) ;

Example 6: Every supplier number in relvar SP must appear in relvar S.

CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

As with Example 3, I’ll have more to say about this example in the next section.

Example 7: No supplier number appears in both relvar LS and relvar NLS.

CONSTRAINT CX7 FORALL LX ( FORALL NX ( LX.SNO ≠ NX.SNO ) ) ;

LX and NX range over LS and NLS, respectively.

Example 8: Supplier S1 and part P1 must never be in different cities.

CONSTRAINT CX8 FORALL SX ( FORALL PX
( IF SX.SNO = 'S1' AND PX.PNO = 'P1' THEN SX.CITY = PX.CITY ) ) ;

Example 9: There must always be at least one supplier. (There’s no counterpart to this example in Chapter 8.)

CONSTRAINT CX9 EXISTS SX ( TRUE ) ;

The expression EXISTS SX (TRUE) evaluates to FALSE if and only if SX ranges over an empty relation. (By contrast, the expression EXISTS SX (FALSE) always evaluates to FALSE. Conversely, the expression FORALL SX (FALSE) evaluates to TRUE if and only if SX ranges over an empty relation—see the discussion of empty ranges in the next section—while the expression FORALL SX (TRUE) always evaluates to TRUE.)

MORE ON QUANTIFICATION

There are a number of further issues I need to discuss regarding quantification in particular.

We Don’t Need Both Quantifiers

It’s easy to see that any predicate that can be expressed in terms of EXISTS can be expressed in terms of FORALL instead and vice versa. By way of example, consider the following predicate once again:

EXISTS x ( x is taller than Steve )

(“Somebody is taller than Steve”; of course, this predicate is in fact a simple proposition). Another way to say the same thing is:

NOT ( FORALL x ( NOT ( x is taller than Steve ) ) )

(“It is not the case that nobody is taller than Steve”). More generally, in fact, the predicate

EXISTS x ( p ( x ) )

is logically equivalent to the predicate

NOT ( FORALL x ( NOT ( p ( x ) ) ) )

(where the predicate p might legitimately involve other parameters in addition to x). Likewise, the predicate

FORALL x ( p ( x ) )

is logically equivalent to the predicate

NOT ( EXISTS x ( NOT ( p ( x ) ) ) )

(where, again, the predicate p might legitimately involve other parameters in addition to x).

It follows from all of the above that a formal language doesn’t need to support both EXISTS and FORALL explicitly. But it’s very desirable to support them both in practice. The reason is that some problems are “more naturally” formulated in terms of EXISTS, while others are “more naturally” formulated in terms of FORALL instead. For example, SQL supports EXISTS but not FORALL; as a consequence, certain queries are quite awkward to formulate in SQL. Consider again the query “Get suppliers who supply all parts,” which can be expressed in relational calculus quite simply as follows:

{ SX } WHERE FORALL PX ( EXISTS SPX
( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

In SQL, by contrast, the query has to look something like this:

SELECT SX.*
FROM   S AS SX
WHERE  NOT EXISTS
     ( SELECT *
       FROM   P AS PX
       WHERE  NOT EXISTS
            ( SELECT *
              FROM   SP AS SPX
              WHERE  SX.SNO = SPX.SNO
              AND    SPX.PNO = PX.PNO ) )

(“Get suppliers SX such that there does not exist a part PX such that there does not exist a shipment SPX linking that supplier SX to that part PX”). Well, single negation is bad enough (users often have trouble with it); double negation, as in this example, is much worse.

Empty Ranges

Consider again the fact that the predicates

EXISTS x ( p ( x ) )

and

NOT ( FORALL x ( NOT ( p ( x ) ) ) )

are logically equivalent. As we know, the bound variable x in each of these predicates must range over some set of permissible values. Suppose now that the set in question is empty; it might, for example, be the set of persons over fifty feet tall (or in the database context, more realistically, it might be the set of tuples in a relvar that’s currently empty). Then:

The expression EXISTS x (p(x)) evaluates to FALSE, because “there is no x”—i.e., there’s no value available to be substituted for x in order to make the expression true. Note carefully that these remarks are valid regardless of what p(x) happens to be. For example, “There exists a person over fifty feet tall who works for IBM” evaluates to FALSE (unsurprisingly).
It follows that the negation NOT EXISTS x (p(x)) evaluates to TRUE—again, regardless of what p(x) happens to be. For example, “There doesn’t exist a person over fifty feet tall who works for IBM”—more colloquially, “No person over fifty feet tall works for IBM”— evaluates to TRUE (again unsurprisingly).
But NOT EXISTS x (p(x)) is equivalent to FORALL x (NOT (p(x))), and so this latter expression also evaluates to TRUE—once again, regardless of what p(x) happens to be.
But if the predicate p(x) is arbitrary, then so is the predicate NOT (p(x)). And so we have the following possibly surprising result: The expression FORALL x (...) evaluates to TRUE if there are no x’s, regardless of what appears inside the parentheses. For example, “All persons over fifty feet tall do work for IBM” also evaluates to TRUE—because, to say it again, there aren’t any persons over fifty feet tall.

One implication of the foregoing state of affairs is that certain queries will produce a result that you might not expect (if you don’t know logic, that is). For example, the query discussed earlier—

{ SX } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

(“Get suppliers who supply all parts”)—will return all suppliers if there aren’t any parts.

Incidentally, the foregoing example serves as a good illustration of the point that while logic is certainly necessary as a foundation for database systems, it might not be sufficient. For example, how do you think an only child should respond to the question “Are your siblings all boys?” The logically correct answer is, of course, yes (though I observe that yes is the logically correct answer to the question “Are your siblings all girls?” as well). In practice, however, we would surely expect some more informative response, along the lines of “Well, actually I don’t have any siblings.” In other words—and now reverting to database systems as such—it might be nice if the system, as well as simply giving its responses as such, could also explain those responses if and when asked to do so.

Defining EXISTS and FORALL

As you might have already realized, EXISTS and FORALL can be defined as an iterated OR and an iterated AND, respectively. I’ll consider EXISTS first. Let p(x) be a predicate with a parameter x and let x range over the set X = {x1,x2,...,xn}. Then

EXISTS x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate

p ( x1 ) OR p ( x2 ) OR ... OR p ( xn ) OR FALSE

Observe in particular that this expression evaluates to FALSE if X is empty (equivalently, if n = 0), as we already know. By way of example, let p(x) be “x has a moon” and let X be the set {Mercury, Venus, Earth, Mars}. Then the predicate EXISTS x (p(x)) becomes “EXISTS x (x has a moon),” and it’s shorthand for

( Mercury has a moon ) OR ( Venus has a moon ) OR
( Earth has a moon ) OR ( Mars has a moon ) OR FALSE

which evaluates to TRUE because, e.g., “Mars has a moon” is true. Similarly,

FORALL x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate

p ( x1 ) AND p ( x2 ) AND ... AND p ( xn ) AND TRUE

And this expression evaluates to TRUE if X is empty (again, as we already know). By way of example, let p(x) and X be as in the EXISTS example above. Then the predicate FORALL x (p(x)) becomes “FORALL x (x has a moon),” and it’s shorthand for

( Mercury has a moon ) AND ( Venus has a moon ) AND
( Earth has a moon ) AND ( Mars has a moon ) AND TRUE

which evaluates to FALSE because, e.g., “Venus has a moon” is false.

So we see that defining EXISTS and FORALL as iterated OR and AND, respectively, means that every predicate that involves quantification is equivalent to one that doesn’t. Thus, you might be wondering, not without some justification, just what this business of quantification is really all about ... Why all the fuss? The answer is as follows: We can define EXISTS and FORALL as iterated OR and AND only because the sets we have to deal with are—thankfully— always finite (because we’re operating in the realm of computers and computers are finite in turn). In pure logic, where there’s no such restriction, those definitions aren’t valid.¹³

Perhaps I should add that, even though we’re always dealing with finite sets and EXISTS and FORALL are thus merely shorthand, they’re extremely useful shorthand! For my part, I certainly wouldn’t want to have to formulate queries and the like purely in terms of AND and OR, without being able to use the quantifiers. Also (and much more to the point, perhaps), the quantifiers allow us to formulate queries without having to know the precise content of the database at any given time—which wouldn’t be the case if we always had to use the explicit iterated OR and AND equivalents.¹⁴

Other Kinds of Quantifiers

While it’s true that the EXISTS and FORALL quantifiers are far and away the most important ones in practice, they aren’t the only ones possible. There’s no a priori reason, for example, why we shouldn’t allow quantifiers of the form

there exist at least three x’s such that

a majority of x’s are such that

an odd number of x’s are such that

(and so on). One fairly important special case is there exists exactly one x such that. I’ll use the keyword UNIQUE for this one. Here are some examples:

UNIQUE x ( x is taller than Arnold )

Meaning: Exactly one person is taller than Arnold; probably FALSE.

UNIQUE x ( x has social security number y )

Meaning: Exactly one person has social security number y (y is a parameter). We can’t assign a truth value to this example because it’s a (monadic) predicate and not a proposition.

FORALL y ( UNIQUE x ( x has social security number y ) )

Meaning: Everybody has a unique social security number (I’m assuming here that y ranges over the set of all social security numbers actually assigned, not all possible ones). Exercise: Does this predicate—which is in fact a proposition—evaluate to TRUE?

As another exercise, what does the following predicate mean?

FORALL x ( UNIQUE y ( x has social security number y ) )

Here’s how UNIQUE might be used in the formulation of constraints. Recall the formulation I gave earlier for constraint CX3 (“every supplier has a unique supplier number”):

A much better formulation would clearly be as follows:

CONSTRAINT CX3 FORALL SX ( UNIQUE SY ( SX.SNO = SY.SNO ) ) ;

(“For all suppliers SX, there’s exactly one supplier SY with the same supplier number.”) For example, if SX denotes the tuple for supplier S4, say, then SY must also denote the tuple for supplier S4—in other words, SX and SY must denote the very same tuple—in order for the constraint to be satisfied.

By way of another example, recall the following constraint: “Every supplier number in relvar SP must appear in relvar S.” Here’s the formulation I gave previously:

CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

However, I hope you can see a more accurate formulation is:

CONSTRAINT CX6 FORALL SPX ( UNIQUE SX ( SX.SNO = SPX.SNO ) ) ;

In other words, for a given tuple in relvar SP, we want there to be not at least one (EXISTS), but exactly one (UNIQUE), corresponding tuple in relvar S. The previous formulation “works” because there’s an additional constraint in effect: viz., that {SNO} is a key for relvar S. But the revised formulation is closer to what we really want to say.

Now, SQL does support UNIQUE (sort of), though its support is even more indirect than its support for EXISTS is. To be specific, let sq be a subquery; then UNIQUE sq is a boolean expression, and it evaluates to FALSE if the table denoted by sq contains any duplicate rows and TRUE otherwise. Note that it follows from this definition that the operator certainly returns TRUE if its argument table has either just one row or no rows at all.¹⁵ And it further follows that, whereas the logic expression

UNIQUE x ( p ( x ) )

means “There exists exactly one argument value a corresponding to the parameter x such that p(a) evaluates to TRUE,” the (very approximate!) SQL analog—

UNIQUE ( SELECT k FROM T AS ... WHERE p ( x ) )

—where k denotes an arbitrary constant value, say the integer 0—means “Given an argument value a corresponding to the parameter x, there exists at most one row in the pertinent table T such that p(a) evaluates to TRUE.” For example, given our usual sample value for relvar S, the SQL expression

UNIQUE ( SELECT 0 FROM S AS SX WHERE SX.CITY = 'Athens' )

returns TRUE, while the SQL expression

UNIQUE ( SELECT 0 FROM S AS SX WHERE SX.CITY = 'Paris' )

returns FALSE.¹⁶

All of that being said, I won’t attempt to give an SQL formulation here for constraint CX6 that uses UNIQUE—I’ll leave it to Chapter 11 (see Example 10 in that chapter).

As you can see, the foregoing examples are designed to exploit the fact that SQL retains duplicates in the result of a SELECT expression if DISTINCT isn’t specified. Of course, I’ve suggested elsewhere in this book—in Chapter 4, to be specific—that DISTINCT should “always” be specified. In contexts like the one under discussion, however, DISTINCT must definitely not be specified (right?).

Of course, I don’t mean to suggest that the argument expression in an SQL UNIQUE invocation must always be of the form “SELECT k FROM ...,” where k denotes some constant. By way of a counterexample, here repeated from Chapter 8 is one possible SQL formulation of the constraint that distinct suppliers must have distinct supplier numbers:

CREATE ASSERTION CX3 CHECK ( UNIQUE ( SELECT SNO FROM S ) ) ;

Recall now that SQL also uses the keyword UNIQUE in key constraints. For example, the CREATE TABLE for table S includes the following specification:

UNIQUE ( SNO )

You can think of this specification as shorthand for the following (which could be part of a more general base table constraint or a CREATE ASSERTION statement):

CHECK ( UNIQUE ( SELECT SNO FROM S ) )

Aside: To repeat something I said (in a footnote) in Chapter 8, what the standard actually says in this connection is as follows (more or less): “The constraint UNIQUE (SNO) is not satisfied if and only if EXISTS (SELECT * FROM S WHERE NOT (UNIQUE (SELECT SNO FROM S))) evaluates to TRUE.” Well, it seems to me this definition could surely be simplified, thus: “The constraint UNIQUE (SNO) is satisfied if and only if UNIQUE (SELECT SNO FROM S) evaluates to TRUE.” Now, I dare say there’s a good reason for what seems to me the standard’s excessive circumlocution here, but whatever it is certainly escapes me. Perhaps it has to do with nulls, in which case I’m not interested. End of aside.

SQL also uses the keyword UNIQUE in MATCH expressions. Here’s an example (“Get suppliers who supply exactly one part”):¹⁷

SELECT SX.*
FROM   S AS SX
WHERE  SX.SNO MATCH UNIQUE
     ( SELECT SPX.SNO
       FROM   SP AS SPX )

But this usage too is basically just shorthand. For example, the example just shown is equivalent to the following—

SELECT SX.*
FROM   S AS SX
WHERE  UNIQUE ( SELECT SPX.SNO             /* i.e., there's AT   */
                FROM   SP AS SPX           /* MOST one shipment  */
                WHERE  SPX.SNO = SX.SNO )  /* for supplier SX    */
AND    EXISTS ( SELECT SPX.SNO             /* ... and there's    */
                FROM   SP AS SPX           /* also AT LEAST one  */
                WHERE  SPX.SNO = SX.SNO )

Incidentally, note that the UNIQUE invocation here is indeed of the form “UNIQUE (SELECT k FROM ...)” where k denotes a constant value, because the boolean expression in the inner WHERE clause makes SPX.SNO constant with respect to “the current row” of table S, and hence with respect to each evaluation of the outer WHERE clause. Of course, SPX.SNO denotes different constants with respect to different evaluations of this latter clause.

SOME EQUIVALENCES

In this section I offer a few remarks regarding certain equivalences that might have already occurred to you (indeed, I’ve touched on some of them myself from time to time at earlier points). First of all, recall the Tutorial D IS_EMPTY operator, which I introduced in Chapter 3 and made heavy use of in Chapter 8. If the system supports that operator, then there’s no logical need for it to support the quantifiers, thanks to the following equivalences:

EXISTS x ( p ) ≡ NOT ( IS_EMPTY ( X WHERE p ) )
FORALL x ( p ) ≡ IS_EMPTY ( X WHERE NOT ( p ) )

(I’m assuming here that the variable x ranges over a set called X.)

Actually, SQL’s support for EXISTS—and FORALL, such as it is—is based on exactly the foregoing equivalences. The fact is, SQL’s EXISTS isn’t really a quantifier, as such, at all, because it doesn’t involve any bound variables. Instead, it’s an operator, in the conventional sense of that term: a monadic operator of type BOOLEAN, to be precise. Like any monadic operator invocation, an invocation of EXISTS in SQL is evaluated by first evaluating the expression that denotes its sole argument, and then applying the operator per se—in this case EXISTS—to the result of that evaluation. Thus, given the expression EXISTS (tx), where tx is a table expression, the system first evaluates tx to obtain a table t; then it applies EXISTS to t, returning TRUE if t is nonempty and FALSE otherwise. (At least, that’s the conceptual algorithm; various optimizations are possible, but they’re irrelevant to the present discussion.)

And now I can explain why SQL doesn’t directly support FORALL. The reason is that representing the universal quantifier by means of an operator with syntax of the form FORALL (tx)—where tx is again a table expression—couldn’t possibly make sense. For example, consider the hypothetical SQL expression

FORALL ( SELECT * FROM S WHERE CITY = 'Paris' )

What could such an expression possibly mean? It certainly couldn’t mean anything like “All suppliers are in Paris,” because—loosely speaking—the argument to which that hypothetical FORALL operator is being applied isn’t all suppliers, it’s all suppliers in Paris.

In fact, however, we don’t need the quantifiers anyway if the system supports the aggregate operator COUNT, thanks to the following further equivalences:

EXISTS x ( p ) ≡ COUNT ( X WHERE p ) > 0
FORALL x ( p ) ≡ COUNT ( X WHERE p ) = COUNT ( X )
UNIQUE x ( p ) ≡ COUNT ( X WHERE p ) = 1

Now, I’m certainly not a fan of the idea of replacing quantified expressions by expressions involving COUNT invocations, but it would be wrong of me not to mention the possibility.

Aside: Although this book generally has little to say on performance, I should at least point out that the foregoing equivalences (the ones involving COUNT, I mean) could lead to performance problems. For example, consider the following expression, which is an SQL formulation of the query “Get suppliers who supply at least one part”:

SELECT *
FROM   S
WHERE  EXISTS
     ( SELECT *
       FROM   SP
       WHERE  SP.SNO = S.SNO )

Now, here’s another formulation that’s logically equivalent to the foregoing:

SELECT *
FROM   S
WHERE  ( SELECT COUNT ( * )
         FROM   SP
         WHERE  SP.SNO = S.SNO ) > 0

But we don’t really want the system to perform the complete count that’s apparently being requested here and then check to see whether that count is greater than zero; rather, we want it to stop counting, for any given supplier, as soon as it finds the first shipment for that supplier. In other words, we’d really like some optimization to be done. Writing code that effectively requires a certain optimization to be done is usually not a good idea! Recommendation: Be careful over the use of COUNT, therefore; in particular, don’t use it where EXISTS would be more logically correct. End of aside.

Relational Completeness

Every operator of the relational algebra has a precise definition in terms of logic. (I didn’t call this point out explicitly before, but it should be obvious that the definitions I gave in Chapters 6 and 7 for join and the rest can be stated, perhaps a little more precisely, in terms of logic as described in the present chapter.) It follows as a direct consequence that, for every expression of the relational algebra, there’s an expression of the relational calculus that’s logically equivalent to—i.e., has the same semantics as—that algebraic expression. In other words, the relational calculus is at least as “powerful” (better: expressive) as the relational algebra: Anything that can be expressed in the algebra can also be expressed in the calculus.

Now, it might not be obvious, but actually the opposite is true too—that is, for every expression of the calculus, there’s an expression of the algebra that’s logically equivalent to that calculus expression. Thus, the algebra is at least as expressive as the calculus, and so the two formalisms are logically equivalent: Both are what’s called relationally complete.¹⁸ Relational completeness is a basic measure of the expressive capability of a language; if a language is relationally complete, it means (among other things, and speaking a trifle loosely) that queries of arbitrary complexity can be formulated without having to resort to iterative loops or branching. In other words, it’s relational completeness that allows end users—at least in principle, though possibly not in practice—to access the database directly, without having to go through the potential bottleneck of the IT department.

The Importance of Consistency

I have a small piece of unfinished business to attend to. Recall my claim in Chapter 8 that any proposition whatsoever (even obviously false ones like 1 = 0) can be shown to be “true” in an inconsistent system. Now I can elaborate on that claim.

I’ll start with a really simple example. Suppose (a) relvar S is currently nonempty; (b) there’s a constraint to the effect that there must always be at least one part; but (c) relvar P is in fact currently empty (there’s the inconsistency). Now consider the relational calculus query:

{ SX } WHERE EXISTS PX ( TRUE )

Or if you prefer SQL:

SELECT *
FROM   S
WHERE  EXISTS
     ( SELECT *
       FROM   P )

Now, if this query is evaluated directly, the result will be empty, because the expression in the WHERE clause evauates to FALSE. Alternatively, if the system (or the user) observes that there’s a constraint that says that EXISTS PX (TRUE) must evaluate to TRUE—or, in SQL, that SELECT * FROM P must return a nonempty result—that WHERE clause can be replaced by one saying simply WHERE TRUE, and the result will then be all suppliers. At least one of these results must be wrong! In a sense, in fact, they’re both wrong; given an inconsistent database, there simply isn’t—there can’t be—any well defined notion of correctness, and any answer is as good (or bad) as any other. Indeed, this state of affairs should be self-evident: If I tell you some proposition p is both true and false, and then ask you whether some proposition that relies on p in some way is true, there’s simply no right answer you can give me.

In case you’re still not convinced, consider the following slightly more realistic SQL example (under the same assumptions as before):

SELECT DISTINCT
CASE WHEN EXISTS ( SELECT * FROM P ) THEN x ELSE y END
FROM S

This expression will return either x or y—more precisely, it will return a table containing a row containing either x or y—depending, in effect, on whether or not the EXISTS invocation is replaced by just TRUE. Now consider that x and y can each be essentially anything at all ... For example, x might be an SQL expression denoting the total weight of all parts, while y might be the literal 0—in which case executing the query could easily lead to the erroneous conclusion that the total part weight is null instead of zero.

CONCLUDING REMARKS

It’s my strong belief that database professionals in general, and SQL practitioners in particular, should have some familiarity with the basic concepts of predicate logic (or relational calculus—it comes to the same thing). I’d like to conclude by trying to justify this position.

My basic point is simply that a knowledge of logic helps you think precisely (and in our field, the importance of thinking precisely is surely paramount). In particular, it forces you to appreciate the significance of proper quantification. Natural language is so often imprecise; however, careful consideration of what quantifiers are needed allows you to pin down the meaning of what can otherwise be very imprecise natural language statements. By way of example, you might like to meditate on exactly what Abraham Lincoln meant—or might have meant, or thought he might have meant, or might have thought he meant—when he famously said: “You can fool some of the people some of the time, and some of the people all the time, but you cannot fool all the people all of the time.”

Now, I’m well aware there are many who disagree with me here; that is, there are many who feel ordinary mortals shouldn’t have to grapple with a subject as abstruse as logic seems to be. In effect, they claim that logic is just too difficult for most people to deal with. Now, that claim might be true in general (logic is a big subject). But you don’t need to understand the whole of logic for the purpose at hand; in fact, I doubt whether you need much more than what I’ve covered in this chapter. And the benefits are so huge! I made essentially the same point in another book—Logic and Databases: The Roots of Relational Theory (Trafford, 2007)—and I’d like to quote the concluding remarks from that earlier discussion here:

Surely it’s worth investing a little effort up front in becoming familiar with [basic logic] in order to avoid the problems associated with ambiguous business rules. Ambiguity in business rules leads to implementation delays at best or implementation errors at worst (possibly both). And such delays and errors certainly have costs associated with them, costs that are likely to outweigh those initial learning costs many times over. In other words, framing business rules properly is a serious matter, and it requires a certain level of technical competence.

As you can see, these remarks are set in the context of business rules specifically, but I think they’re of wider applicability—as I plan to demonstrate in the next chapter.

EXERCISES

10.1 As noted in the body of the chapter, there are exactly 16 dyadic connectives. Show the corresponding truth tables. How many monadic connectives are there?

10.2 Let p and q stand for arbitrary propositions. Prove that

( ( NOT p ) AND ( p OR q ) ) IMPLIES q

is a tautology. (Recall from Chapter 4 that a tautology in logic is an expression that’s guaranteed to evaluate to TRUE, regardless of the values of any variables involved. Likewise, a contradiction in logic is an expression that’s guaranteed to evaluate to FALSE, regardless of the values of any variables involved.)

10.3 Again let p and q denote arbitrary propositions. Prove that the following is a tautology:

NOT ( p AND q ) ≡ ( NOT p ) OR ( NOT q )

10.4 (Repeated from the body of the chapter, but reworded here.) (a) Prove that all of the monadic and dyadic connectives can be expressed in terms of suitable combinations of NOT and either AND or OR; (b) prove also that they can all be expressed in terms of just a single connective.

10.5 Consider the predicate “x is a star.” (a) First, if the argument the sun is substituted for x, does the predicate become a proposition? If not, why not? And what about the argument the moon? (b) Second, if the argument the sun is substituted for x, is the predicate satisfied? If not, why not? And what about the argument the moon?

10.6 Consider the predicate “x has two moons.” If the argument Jupiter is substituted for x, is the predicate satisfied? Justify your answer.

10.7 Here’s constraint CX1 once again from Chapter 8:

CONSTRAINT CX1 IS_EMPTY ( S WHERE STATUS < 1 OR STATUS > 100 ) ;

Now, in Chapter 8, I said the relvar name “S”, in the expression IS_EMPTY (S WHERE ...), was acting as a designator. But isn’t it actually a parameter? If not, what’s the difference?

10.8 (Repeated from the body of the chapter.) What query does the following expression represent? And do you think that query is a “sensible” one?

{ SX.SNAME } WHERE EXISTS SPX ( FORALL PX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )

10.9 (Repeated from the body of the chapter.) Give SQL analogs of the relational calculus expressions in the subsection “More Sample Queries” in the body of the chapter.

10.10 Prove that AND and OR are associative.

10.11 Let p(x) and q be predicates in which x does and does not appear, respectively, as a free variable. Which of the following statements are valid?¹⁹ I remind you that the symbol “⇒” means implies; the symbol “≡” means is equivalent to. Note too that A ⇒ B and B ⇒ A are together the same as A ≡ B (in other words, (A ⇒ B AND B ⇒ A) ≡ (A ≡ B) is a tautology).

EXISTS x ( q ) ≡ q
FORALL x ( q ) ≡ q
EXISTS x ( p(x) AND q ) ≡ EXISTS x ( p(x) ) AND q
FORALL x ( p(x) AND q ) ≡ FORALL x ( p(x) ) AND q
FORALL x ( p(x) ) ⇒ EXISTS x ( p(x) )
EXISTS x ( TRUE ) ≡ TRUE
FORALL x ( FALSE ) ≡ FALSE
UNIQUE x ( p(x) ) ⇒ EXISTS x ( p(x) )
UNIQUE x ( p(x) ) ⇒ FORALL x ( p(x) )
FORALL x ( p(x) ) AND EXISTS x ( p(x) ) ⇒ UNIQUE x ( p(x) )
FORALL x ( p(x) ) AND UNIQUE x ( p(x) ) ⇒ EXISTS x ( p(x) )

10.12 Let p(x,y) be a predicate with free variables x and y. Which of the following statements are valid?

EXISTS x EXISTS y ( p(x,y) ) ≡ EXISTS y EXISTS x ( p(x,y) )
FORALL x FORALL y ( p(x,y) ) ≡ FORALL y FORALL x ( p(x,y) )
FORALL x ( p(x,y) ) ≡ NOT EXISTS x ( NOT p(x,y) )
EXISTS x ( p(x,y) ) ≡ NOT FORALL x ( NOT p(x,y) )
EXISTS x FORALL y ( p(x,y) ) ≡ FORALL y EXISTS x ( p(x,y) )
EXISTS y FORALL x ( p(x,y) ) ⇒ FORALL x EXISTS y ( p(x,y) )

10.13 Where possible and reasonable, give relational calculus solutions to exercises from Chapters 6-9.

10.14 Consider this query: “Get cities in which either a supplier or a part is located.” Can this query be expressed in the relational calculus? If not, why not?

10.15 Is SQL relationally complete? Note: To prove it is, you need to show that for every expression of the relational algebra there exists a semantically equivalent expression in SQL. Alternatively, to prove it isn’t, you need to show there exists at least one expression of the relational algebra for which no such SQL equivalent exists.

10.16 Is prenex normal form always achievable?

ANSWERS

First of all, in the body of the chapter I asked which of the following natural language sentences were legal propositions. My own answers are as follows (but note that some of them, at least, might be open to debate):

Bach is the greatest musician who ever lived. Yes. Subsidiary exercise: Is the proposition true?
What’s the time? No. It doesn’t make sense to ask “Is it true that p?” where p is “What’s the time?”
Supplier S2 is located in some city, x. My own answer here is yes, but you might well argue that it should be no. In fact, the example is a good illustration of the ambiguity problem discussed near the beginning of the chapter; my answer is based on the assumption that the phrase “some city, x” could be abbreviated to just “some city” without changing the overall meaning of the sentence,²⁰ but you might reasonably argue the opposite. Compare and contrast “We both have the same favorite author, x” (see below).
Some countries have a female president. Yes.
All politicians are corrupt. Yes.
Supplier S1 is located in Paris. Yes (though it’s false, given our usual sample values).
We both have the same favorite author, x. No. Here I’m assuming that the phrase “the same favorite author, x” couldn’t be abbreviated to just “the same favorite author” without changing the overall meaning of the sentence (but you could reasonably argue the opposite). If my assumption is correct, then x is a variable (better: a parameter), and until we know what argument is to be substituted for that parameter—i.e., until we know what that x stands for—we can’t assign a truth value to the sentence. But when we do know—i.e., when we substitute Jane Austen, say, for x—then the sentence does become a proposition.
Nothing is heavier than lead. Yes.
It will rain tomorrow. No. It doesn’t make sense to ask “Is it true that it will rain tomorrow?”—at least, not if you’re expecting an answer (yes or no) that’s guaranteed to be unequivocally correct.
Supplier S6’s city is unknown. Yes. See Appendix C for further discussion of propositions like this one.

10.1 See the answer to Exercise 4.10 in Chapter 4.

10.2 Consider the following truth table:

p │ q │ NOT p │ p OR q │ x AND y │ (x AND y) IMPLIES q
  │   │ (= x) │ (= y)  │         │
──┼───┼───────┼────────┼─────────┼────────────────────
T │ T │   F   │   T    │    F    │          T
T │ F │   F   │   T    │    F    │          T
F │ T │   T   │   T    │    T    │          T
F │ F │   T   │   F    │    F    │          T

Since the final column contains T (true) in every position, the given expression is indeed a tautology.

10.3 This is one of De Morgan’s laws. It’s proved in Chapter 11.

10.4 First of all, it’s easy to see that we don’t need both AND and OR, because

p AND q ≡ NOT ( NOT ( p ) OR NOT ( q ) )

and

p OR q ≡ NOT ( NOT ( p ) AND NOT ( q ) )

These equivalences are easily established by means of truth tables, as in the answer to Exercise 10.2 above. What they show is that each of AND and OR can be defined in terms of the other, together with NOT. It follows that we can freely use both AND and OR in what follows.

Now consider the connectives involving a single proposition p. Let c(p) be the connective under consideration. Then the possibilities are as follows:

c(p)  ≡  p                  /* identity     */
c(p)  ≡  NOT ( p )          /* NOT          */
c(p)  ≡  p OR NOT ( p )     /* always TRUE  */
c(p)  ≡  p AND NOT ( p )    /* always FALSE */

Now consider the connectives involving two propositions p and q. Let c(p,q) be the connective under consideration. Then the possibilities are as follows:

c(p,q)  ≡  p
c(p,q)  ≡  q
c(p,q)  ≡  NOT ( p )
c(p,q)  ≡  NOT ( q )
c(p,q)  ≡  p AND q
c(p,q)  ≡  p OR q
c(p,q)  ≡  p AND NOT ( q )
c(p,q)  ≡  p OR NOT ( q )
c(p,q)  ≡  NOT ( p ) AND q
c(p,q)  ≡  NOT ( p ) OR q
c(p,q)  ≡  NOT ( p ) AND NOT ( q )
c(p,q)  ≡  NOT ( p ) OR NOT ( q )
c(p,q)  ≡  p AND NOT ( p ) AND q AND NOT ( q )
c(p,q)  ≡  p OR NOT ( p ) OR q OR NOT ( q )
c(p,q)  ≡  ( NOT ( p ) OR q ) AND ( NOT ( q ) OR p )
c(p,q)  ≡  ( NOT ( p ) AND q ) OR ( NOT ( q ) AND p )

As a subsidiary exercise, and in order to convince yourself that the foregoing definitions do indeed cover all of the possibilities, you might like to construct the corresponding truth tables (and compare them with those given in the answer to Exercise 4.10 in Chapter 4).

Turning to part (b) of the exercise: Actually there are two such primitives, NOR and NAND, often denoted by a down arrow, “↓” (the Peirce arrow) and a vertical bar, “|” (the Sheffer stroke), respectively. Here are the truth tables:

NOR│ T F    NAND │ T F
───┼─────   ─────┼─────
T │ F F      T  │ F T
F │ F T      F  │ T T

As these tables suggest, p↓q (“p NOR q”) is equivalent to NOT (p OR q) and p|q (“p NAND q”) is equivalent to NOT (p AND q). In what follows, I’ll concentrate on NOR (I’ll leave NAND to you). Observe that this connective can helpfully be thought of as “neither nor” (“neither the first operand nor the second is true”). I now show how to define NOT, OR, and AND in terms of this operator:

NOT ( p )   ≡  p ↓ p
p AND q     ≡  ( p ↓ p ) ↓ ( q ↓ q )
p OR q      ≡  ( p ↓ q ) ↓ ( p ↓ q )

For example, let’s take a closer look at the case of p AND q (I’ll leave the other two cases to you):

p │ q │ p↓p │ q↓q │ (p↓p)↓(q↓q)
──┼───┼─────┼─────┼────────────
T │ T │  F  │  F  │      T
T │ F │  F  │  T  │      F
F │ T │  T  │  F  │      F
F │ F │  T  │  T  │      F

This truth table shows that (p↓p)↓(q↓q) is equivalent to p AND q, because its first, second, and final columns are identical to the corresponding columns in the truth table for AND:

p │ q │ p AND q
──┼───┼─────────
T │ T │    T
T │ F │    F
F │ T │    F
F │ F │    F

Since we’ve already seen that all of the other connectives can be expressed in terms of NOT, AND, and OR, the overall conclusion follows.

10.5 (a) “The sun is a star” and “the moon is a star” are both propositions, though the first is true and the second false. (b) The sun satisfies the predicate, the moon doesn’t.

10.6 This exercise points up the question of ambiguity once more. If the predicate means x has exactly two moons, then clearly Jupiter doesn’t satisfy it. But if it means x has at least two moons (and maybe more), then Jupiter does satisfy it. Once again, then, we see how logic forces us—or does its best to force us, at any rate—into thinking clearly and saying exactly what we mean.

10.7 A parameter can be replaced by any argument whatsoever, just so long as it’s of the right type. A designator isn’t, and in fact can’t be, replaced by anything at all; instead—just like a variable reference in a programming language, in fact—it simply “designates,” or denotes, the value of the pertinent variable at the pertinent time (i.e., when the constraint is checked, in the case at hand).

10.8 “Get names of suppliers such that there exists a shipment—a single shipment, that is— that links them to every part.” The query probably isn’t very sensible. To be specific, (a) if relvar P contains exactly one tuple and relvar SP contains n tuples, linking n distinct suppliers to that single part, then it’ll return the names of those n suppliers; (b) if relvar P doesn’t contain exactly one tuple, then it’ll return an empty result.

10.9 The following SQL expressions are deliberately meant to be as close to their relational counterparts (as given in the body of the chapter) as possible. See Chapter 11 for further discussion.

Example 1: Get all pairs of supplier numbers such that the suppliers concerned are colocated.

SELECT SX.SNO AS SA , SY.SNO AS SB
FROM   S AS SX , S AS SY
WHERE  SX.CITY = SY.CITY
AND    SX.SNO < SY.SNO

Example 2: Get names of suppliers who supply at least one red part.

SELECT DISTINCT SX.SNAME
FROM   S AS SX
WHERE  EXISTS
     ( SELECT *
       FROM   SP AS SPX
       WHERE  EXISTS
            ( SELECT *
              FROM   P AS PX
              WHERE  PX.COLOR = 'Red'
              AND    SX.SNO = SPX.SNO
              AND    SPX.PNO = PX.PNO ) )

Example 3: Get names of suppliers who supply at least one part supplied by supplier S2.

SELECT DISTINCT SX.SNAME
FROM   S AS SX
WHERE  EXISTS
     ( SELECT *
       FROM   SP AS SPX
       WHERE  EXISTS
            ( SELECT *
              FROM   SP AS SPY
              WHERE  SX.SNO = SPX.SNO
              AND    SPX.PNO = SPY.PNO
              AND    SPY.SNO = 'S2' ) )

Example 4: Get names of suppliers who don’t supply part P2.

SELECT DISTINCT SX.SNAME
FROM   S AS SX
WHERE  NOT EXISTS
     ( SELECT *
       FROM   SP AS SPX
       WHERE  SPX.SNO = SX.SNO
       AND    SPX.PNO = 'P2' )

Example 5: For each shipment, get full shipment details, including total shipment weight.

SELECT SPX.* , PX.WEIGHT * SPX.QTY AS SHIPWT
FROM P AS PX , SP AS SPX
WHERE PX.PNO = SPX.PNO

Example 6: For each part, get the part number and the total shipment quantity.

SELECT PX.PNO , ( SELECT COALESCE ( SUM ( SPX.QTY ) , 0 )
FROM SP AS SPX
WHERE SPX.PNO = PX.PNO ) AS TOTQ
FROM P AS PX

Example 7: Get cities that store more than five red parts.

SELECT DISTINCT PX.CITY
FROM   P AS PX
WHERE ( SELECT COUNT ( * )
        FROM   P AS PY
        WHERE  PY.CITY = PX.CITY
        AND    PY.COLOR = 'Red' ) > 5

10.10 The following truth table shows (how, exactly?) that AND is associative; the proof for OR is analogous.

p │ q │ r │ p AND q │ (p AND q) AND r │ q AND r │ p AND (q AND r)
──┼───┼───┼─────────┼─────────────────┼─────────┼────────────────
T │ T │ T │    T    │        T        │    T    │       T
T │ T │ F │    F    │        F        │    F    │       F
T │ F │ T │    F    │        F        │    F    │       F
T │ F │ F │    T    │        T        │    T    │       T
F │ T │ T │    F    │        F        │    F    │       F
F │ T │ F │    F    │        F        │    F    │       F
F │ F │ T │    F    │        F        │    F    │       F
F │ F │ F │    F    │        F        │    F    │       F

Note: The formal name for AND is conjunction, and its operands (i.e., the terms being ANDed together) are called conjuncts. Similarly, the formal name for OR is disjunction, and its operands are called disjuncts. Further, since (a) AND and OR are commutative as well as associative and (b) they both possess an identity value, we can define n-adic versions, as follows:

AND {p1,p2,...,pn} is defined to be equivalent to

( p1 ) AND ( p2 ) AND ... AND ( pn ) AND TRUE

If none of the conjuncts (i.e., the p’s) involves any ANDs, the expression overall is said to be in conjunctive normal form (CNF).
OR {p1,p2,...,pn} is defined to be equivalent to

( p1 ) OR ( p2 ) OR ... OR ( pn ) OR FALSE

If none of the disjuncts (i.e., the p’s) involves any ORs, the expression overall is said to be in disjunctive normal form (DNF).

Note: It’s clear from these definitions that if n = 1, then n-adic AND and OR both return p1; if n = 0, they return TRUE and FALSE, respectively.

10.11 a. Not valid (suppose x ranges over an empty set and q is TRUE; then EXISTS x (q) is FALSE). b. Not valid (suppose x ranges over an empty set and q is FALSE; then FORALL x (q) is TRUE). c. Valid. d. Valid. e. Not valid (suppose x ranges over an empty set; then FORALL x (p(x)) is TRUE but EXISTS x (p(x)) is FALSE, and TRUE ⇒ FALSE is FALSE). f. Not valid (suppose x ranges over an empty set; then EXISTS x (TRUE) is FALSE). g. Not valid (suppose x ranges over an empty set; then FORALL x (FALSE) is TRUE). h. Valid. i. Not valid (e.g., the fact that exactly one integer is equal to zero doesn’t imply that all integers are equal to zero). j. Not valid (e.g., the fact that all days are 24 hours long and the fact there exists at least one day that’s 24 hours long don’t together imply that exactly one day is 24 hours long). k. Valid. Note: (Valid!) equivalences and implications like those under discussion here (and in the next exercise) can be used as a basis for a set of calculus expression transformation rules, much like the algebraic expression transformation rules discussed in Chapter 6. See Chapter 11 for further discussion.

10.12 a. Valid. b. Valid. c. Valid. d. Valid. e. Not valid (e.g., saying that for every integer y there exists a greater integer x isn’t the same as saying there exists an integer x that’s greater than all integers y). f. Valid.

10.13 I give solutions only in those cases—in fact, in just a few cases, from Chapters 6 and 7 only—where there’s some significant point to be made regarding the solution in question. For cross reference purposes, I show the numbers of the original exercises in italics. Exercises from Chapter 6:

6.12 The following relational calculus expressions denote TABLE_DEE and TABLE_DUM, respectively:

{ } WHERE TRUE

{ } WHERE FALSE

And this expression denotes the projection of the current value of relvar S on no attributes:

{ } WHERE EXISTS ( SX )

The relational calculus isn’t usually considered as having a direct counterpart to Tutorial D’s r {ALL BUT ...}, but there’s no reason in principle why it shouldn’t.

6.15 The relational calculus isn’t usually considered as having a direct counterpart to Tutorial D’s D_UNION or I_MINUS, but there’s no reason in principle why it shouldn’t. Note: The relational calculus counterpart to regular UNION is illustrated in the answer to Exercise 10.14 below.

10.13 (cont.) Exercises from Chapter 7:

7.1

d. { PX } WHERE SUM ( SPX WHERE SPX.PNO = PX.PNO , QTY ) < 500

e. { PX } WHERE EXISTS ( SX WHERE SX.CITY = PX.CITY )

f. { PX , SCT := COUNT ( SPX WHERE SPX.PNO = PX.PNO ) }

7.8 A relational calculus analog of the Tutorial D expression SP GROUP { } AS X is:

{ SPX , X := { } }

7.11 { SX , PNO_REL := { SPX.PNO } WHERE SPX.SNO = SX.SNO }

7.12 In practice we need analogs of the conventional INSERT, DELETE, and UPDATE (and relational assignment) operators that are in keeping with a calculus style rather than an algebraic one (and this observation is true regardless of whether we’re talking about relvars with RVAs, as in the present context, or without them). Further details are beyond the scope of the present book, but in any case are straightforward. No further answer provided.

10.14 Recall from the body of the chapter that the set over which a range variable ranges is always the body of some relation—usually but not always the relation that’s the current value of some relvar (emphasis added). In this example, the range variable ranges over what is, in effect, a union:

RANGEVAR CX RANGES OVER { SX.CITY } , { PX.CITY } ;

{ CX } WHERE TRUE

Note that the definition of range variable CX makes use of range variables SX and PX, which I assume to have been previously defined.

10.15 In order to show that SQL is relationally complete, it’s sufficient to show that (a) there exist SQL expressions for each of the algebraic operators restrict, project, product, union, and difference (because, as noted in Chapter 6, all of the other algebraic operators discussed in this book can be defined in terms of these five),²¹ and (b) the operands to those SQL expressions can be arbitrarily complex SQL expressions in turn. So let’s give it a try.

First of all, as we know, SQL effectively does support the relational algebra RENAME operator, thanks to the availability of the optional AS specification on items in the SELECT clause.²² We can therefore ensure that tables do all have proper column names, and hence that the operands to product, union, and difference in particular satisfy the requirements of the algebra with respect to such naming. Furthermore—provided those naming requirements are indeed satisfied—the SQL column name inheritance rules in fact coincide with those of the algebra as described in Chapter 6.

Here then are SQL expressions corresponding approximately to the five primitive operators mentioned above:

Algebra	SQL
R WHERE bx	SELECT * FROM R WHERE bx
R { A , B , ... , C }	SELECT DISTINCT A , B , ... , C FROM R
R1 TIMES R2	SELECT * FROM R1 , R2
R1 UNION R2	SELECT * FROM R1 UNION CORRESPONDING SELECT * FROM R2
R1 MINUS R2	SELECT * FROM R1 EXCEPT CORRESPONDING SELECT * FROM R2

Moreover, (a) R, R1, and R2 in the SQL expressions shown above are all table expressions (in the sense in which I’ve been using this latter term throughout this book up to this point), and (b) if we take any of those expressions and enclose it in parentheses, what results is a table expression in turn.²³ It follows that SQL is indeed relationally complete. Or is it? Unfortunately, the answer is no. The reason is that there’s a slight (?) glitch in the foregoing argument—SQL fails to support projection on no columns at all, because it doesn’t support empty commalists in the SELECT clause. As a consequence, it doesn’t support TABLE_DEE or TABLE_DUM, and therefore it isn’t relationally complete after all ... but it “nearly” is.

10.16 No, it isn’t—though textbooks on logic usually claim the opposite, and in practice it’s “usually” achievable. Here’s an example of a relational calculus expression where the predicate in the WHERE clause can’t be replaced by a PNF equivalent:

SX WHERE EXISTS SPX ( SPX.SNO = SX.SNO ) OR SX.CITY = 'Athens'

The paper “A Remark on Prenex Normal Form” (see Appendix G) explains the situation in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for SQL and Relational Theory

Create new playlist

Sign In

Sign Up

Chapter 10

WHY DO WE NEED LOGIC?

SIMPLE AND COMPOUND PROPOSITIONS

Connectives

A Remark on Commutativity

Contrapositives

SIMPLE AND COMPOUND PREDICATES

Rules of Inference

QUANTIFICATION

Free and Bound Variables

RELATIONAL CALCULUS

More on Range Variables

More Sample Queries

Sample Constraints

MORE ON QUANTIFICATION

We Don’t Need Both Quantifiers

Empty Ranges

Defining EXISTS and FORALL

Other Kinds of Quantifiers

SOME EQUIVALENCES

Relational Completeness

The Importance of Consistency

CONCLUDING REMARKS

EXERCISES

ANSWERS

Table of Contents for
SQL and Relational Theory