Appendix A. A Little Bit of Logic

In Chapter 1, I mentioned that there’s an alternative to the relational algebra called the relational calculus. Relational calculus is essentially an applied form of predicate calculus, tailored to the needs of relational databases. Queries and constraints, for example, can be formulated using relational calculus instead of relational algebra; indeed, sometimes it’s easier to come up with a calculus formulation (on the other hand, sometimes the opposite is true). And it’s desirable, in my opinion, for database professionals to have at least a nodding acquaintance with the basic concepts of the calculus as well as the algebra; hence this appendix, which offers a brief and very informal introduction to such matters.

By the way, it follows from the preceding paragraph that a relational language can be based on either the algebra or the calculus. So which is SQL based on? The answer, regrettably, is partly both and partly neither. When it was first designed, SQL was specifically intended to be different from both the algebra and the calculus; indeed, such a goal was the prime motivation for the introduction of the SQL “IN subquery” construct. As time went by, however, it turned out that certain features of both the algebra and the calculus were needed after all, and the language grew to accommodate them. The situation today is thus that some aspects of SQL are “algebra-like,” some are “calculus-like,” and some are neither.

Propositions

Recall from Chapter 4 that a proposition is a statement that’s unequivocally either true or false. Given some specified set of propositions, we can combine propositions from that set in a variety of ways to form further propositions, using the connectives NOT, AND, OR, IF...THEN...(or IMPLIES), and IF AND ONLY IF (or BI-IMPLIES, or IS EQUIVALENT TO, or simply IFF). For example, here are some of the many propositions that can be formed from the set of propositions (a) Jupiter is a star, (b) Mars has two moons, and (c) Venus is between Earth and Mercury:

(Jupiter is a star) OR (Mars has two moons)

(Jupiter is a star) AND (Jupiter is a star)

(Venus is between Earth and Mercury) AND NOT (Jupiter is a star)

IF (Jupiter is a star) THEN (Mars has two moons)

Clearly, the connectives can be regarded as logical operators: they take propositions as their input and return another proposition as their output. (I’ve used parentheses in the examples to make the scope of the connectives clear. In practice, we adopt certain precedence rules that allow us to omit many of the parentheses that might otherwise be needed. Of course, it’s never wrong to include them, even when they’re logically unnecessary.)

A proposition that involves no connectives at all is called a simple proposition; a proposition that isn’t simple is called compound . The truth value of a compound proposition can be determined from the truth values of the constituent simple propositions in accordance with the familiar truth tables:

image with no caption

For space reasons I’ve abbreviated TRUE and FALSE to just T and F, respectively. By way of example, consider the following compound proposition:

IF (Mars has two moons) THEN (Venus is between Earth and Mercury)

This proposition is of the form IF p THEN q (or p IMPLIES q). Both constituent simple propositions evaluate to TRUE; the truth table for IMPLIES therefore shows us that the overall proposition evaluates to TRUE also. However, a point arises here that sometimes causes confusion. In the real world, the fact that Venus is between Earth and Mercury obviously has nothing to do with whether Mars has two moons. In logic, however, we simply define IF p THEN q to be true if q is true, regardless of whether p is true; in fact, if p is false, we define IF p THEN q to be true, regardless of the truth value of q. Thus, for example, IF (Jupiter is a star) THEN (Earth has two moons) is true! You can interpret this state of affairs as saying, in effect, that if you’ll believe a falsehood (p), then you’ll believe anything (q).

Note

If this discussion leaves you with a slightly unpleasant taste in your mouth, you’re not alone. But logicians much better than I have struggled with the problem of trying to justify the fact that IF p THEN q is true if p is false and q is true. The issue isn’t worth fighting over for the purposes of this appendix.

It follows from the foregoing that the proposition p IMPLIES q (or IF p THEN q) is logically equivalent to the proposition (NOT p) or q—it evaluates to FALSE if and only if p evaluates to TRUE and q to FALSE. This equivalence illustrates the point that the connectives NOT, AND, OR, IMPLIES, and IFF aren’t all primitive. In fact, all possible logical connectives involving either one or two propositions can be expressed in terms of suitable combinations of NOT and either AND or OR. (Exercise: Check this claim.) Perhaps even more remarkably, all such connectives can in fact be expressed in terms of just one primitive. Can you find it?

Predicates

Consider the following statements:

x is a star

x has two moons

x has n moons

x is between Earth and y

x is between y and z

These statements aren’t propositions, because they aren’t unequivocally either true or false. And the reason they aren’t is that they involve parameters (also known as placeholders or free variables). For example, the statement "x is a star” involves the parameter x, and we obviously can’t say whether it’s true or false unless we’re told what that x stands for (at which point we’re no longer dealing with the given statement anyway but a different one instead—see the next paragraph).

Now, we can substitute arguments for the parameters and thereby obtain propositions from the statements. For example, if we substitute the argument The Sun for the parameter x in "x is a star,” we obtain “The Sun is a star.” And this statement is indeed a proposition because it’s unequivocally either true or false (in fact, of course, it’s true). But the original statement as such (”x is a star”) is, to say it again, not itself a proposition. Rather, it’s a predicate , which—as you’ll recall from Chapter 4—is a truth-valued function; that is, it’s a function that, when invoked, returns a truth value. Like all functions, a predicate has a set of parameters; when it’s invoked, arguments are substituted for the parameters; substituting arguments for the parameters effectively converts the predicate into a proposition; and we say the arguments satisfy the predicate if and only if that proposition is true. For example, the argument The Sun satisfies the predicate "x is a star,” whereas the argument The Moon does not.

As an aside, I remind you from Chapter 4 that logicians speak not of invoking a predicate but rather of instantiating it. In fact, for reasons that need not concern us here, their concept of instantiation is slightly more general than that of the familiar notion of function invocation. However, I’ll stay with the terminology of invocation in this appendix.

I also remind you from Chapter 4 that a proposition can be regarded as a degenerate predicate. To be precise, it’s a predicate for which the corresponding set of parameters is empty (and the function thus always returns the same result, either TRUE or FALSE, every time it’s invoked). In other words, all propositions are predicates, but most predicates aren’t propositions.

Now consider the example "x has n moons.” Obviously, this predicate involves two parameters, x and n. Substituting the arguments Mars for x and 2 for n yields a true proposition; substituting the arguments Earth for x and 2 for n yields a false one.

Predicates can conveniently be classified according to the cardinality of their set of parameters. That is, we often speak of an n-place predicate, meaning a predicate with exactly n parameters; for example, "x is between y and z" is a 3-place predicate, while "x has n moons” is a 2-place predicate. A proposition is a 0-place predicate.

Note

An n-place predicate is also called an n-adic predicate. If n = 1, the predicate is said to be monadic; if n = 2, it’s said to be dyadic.

Given a set of predicates, we can combine predicates from that set in a variety of ways to form further predicates, using the logical connectives already discussed—NOT, AND, OR, and so forth (in other words, those connectives are logical operators that operate on predicates in general, not just on the special predicates that happen to be propositions). A predicate that involves no connectives is called simple ; a predicate that isn’t simple is called compound . Here’s a trivial example of a compound predicate:

(x is a star) OR (x is between Earth and y)

This predicate is also dyadic— not because it involves two simple predicates, but because it involves two parameters, x and y.

Quantification

I showed in the previous section that one way to get a proposition from a predicate is to invoke it with an appropriate set of arguments. But there’s another way, too, and that’s by means of quantification. Let p(x) be a monadic predicate (I show the single parameter x explicitly, for clarity). Then:

The expression:

   EXISTS x ( p ( x ) )

is a proposition, and it means: “There exists at least one possible argument value a corresponding to the parameter x such that p(a) evaluates to TRUE” (in other words, the argument value a satisfies predicate p). For example, if p is the predicate "x is a logician,” then:

    EXISTS x ( x is a logician )

is a proposition—one that evaluates to TRUE, as it happens (for example, take a to be Bertrand Russell).

The expression:

    FORALL 
            x ( p ( x ) )

is a proposition, and it means: “All possible argument values a corresponding to the parameter x are such that p(a) evaluates to TRUE” (in other words, all such argument values a satisfy predicate p). For example, if again p is the predicate "x is a logician,” then:

    FORALL x ( x is a logician )

is a proposition—one that evaluates to FALSE, as it happens (for example, take a to be George W. Bush).

Observe that it’s sufficient to produce a single example to show the truth of the EXISTS proposition and a single counterexample to show the falsity of the FORALL proposition. Observe too in both cases that the parameter must be constrained to “range over” some set of permissible values (the set of all people, in the example). I’ll come back to this point later, in the section "Database Constraints.”

The term used in logic for the expressions EXISTS x and FORALL x in the foregoing discussion is quantifiers (the term derives from the verb to quantify, which simply means to express as a quantity— that is, to say how much of something there is or how many somethings there are). Quantifiers of the form EXISTS...are said to be existential ; quantifiers of the form FORALL...are said to be universal . (And in logic texts, EXISTS is usually represented by a backward E and FORALL by an upside-down A. I use the keywords EXISTS and FORALL for typographical reasons—not to mention readability.)

By way of another example, let q be the dyadic predicate "x is taller than y.” If we “quantify existentially over x,” we obtain:

    EXISTS x ( x is taller than y )

Now, this statement is not a proposition, because it isn’t unequivocally either true or false; in fact, it’s a monadic predicate—it has a single parameter, y. Suppose we invoke this predicate with argument Steve. We obtain:

    EXISTS x ( x is taller than Steve )

This statement is a proposition (and if there exists at least one person—Arnold, say—who’s taller than Steve, then it evaluates to TRUE, of course). But another way to obtain a proposition from the original predicate q is to quantify over both of the parameters x and y. For example:

    EXISTS x ( EXISTS y ( x is taller than y ) )

This statement is indeed a proposition; it evaluates to FALSE only if nobody is taller than anybody else and to TRUE otherwise (think about it!).

There are several lessons to be learned from this example:

  • To obtain a proposition from an n-adic predicate by quantification, it’s necessary to quantify over every parameter. More generally, if we quantify over m parameters (m n), then we obtain a k-adic predicate, where k = n-m.

  • Let’s focus on existential quantification only for the moment. Then there are apparently two different propositions we can obtain in the example by “quantifying over everything”:

        EXISTS x ( EXISTS y ( x is taller than y) )
        EXISTS y ( EXISTS x ( x is taller than y) )
    

    It should be clear, however, that these two propositions both mean the same thing: “There exist two persons x and y such that x is taller than y.” More generally, in fact, it’s easy to see that a series of like quantifiers (all existential or all universal) can be written in any sequence we choose without changing the overall meaning of the predicate or proposition in question. We therefore allow unnecessary parentheses to be dropped, as here:

        EXISTS y EXISTS x ( x is taller than y )
    

    By contrast, with unlike quantifiers, the sequence matters, in general (see the next point).

  • Of course, we can use either EXISTS or FORALL in each of the necessary quantifiers when “quantifying over everything.” In the example, therefore, there are six distinct propositions that can be obtained by fully quantifying, and I’ve listed them below. (Actually there are eight, but two of those eight can be eliminated by virtue of the previous point.) I’ve also shown a precise natural-language interpretation in each case. Note that those interpretations are all logically different! Please note too, however, that I’ve had to assume in connection with those interpretations that there do exist at least two people “in the universe,” as it were. I’ll come back to this assumption later (in a discussion of empty ranges in the section "More on Quantification“).

        EXISTS x ( EXISTS y ( x is taller than y ) )
    

    Meaning: Somebody is taller than somebody else; TRUE, unless everybody is the same height.

        EXISTS x ( FORALL y ( x is taller than y ) )
    

    Meaning: Somebody is taller than everybody (that particular somebody included!); clearly FALSE.

        FORALL x ( EXISTS y ( x is taller than y ) )
    

    Meaning: Everybody is taller than somebody; clearly FALSE.

        EXISTS y ( FORALL x ( x is taller than y ) )
    

    Meaning: Somebody is shorter than everybody (that particular somebody included!); clearly FALSE.

        FORALL y ( EXISTS x ( x is taller than y ) )
    

    Meaning: Everybody is shorter than somebody; clearly FALSE.

        FORALL x ( FORALL y ( x is taller than y ) )
    

    Meaning: Everybody is taller than everybody; clearly FALSE.

  • Finally (at the risk of belaboring the obvious), even though five out of six of the foregoing propositions all evaluate to the same truth value, FALSE, it doesn’t follow that they all mean the same thing, and indeed they don’t; in fact, no two of them do.

Free and Bound Variables

I said earlier that another term for what I’ve been calling parameters is free variables. Quantifying over a free variable converts it into a bound variable . For example, consider again the 2-place predicate q from the previous section:

            x is taller than y
         

The parameters x and y here are free variables. If we now “quantify existentially” over x, we obtain:

    EXISTS x ( x is taller than y )

Here, y is still free, but x is now bound. And in the “fully quantified” form:

    EXISTS x EXISTS y ( x is taller than y )

x and y are both bound, and there are no free variables at all (the predicate has degenerated to a proposition).

Now, we already know that free variables correspond to parameters, in conventional programming terms. Bound variables, by contrast, don’t have an exact counterpart in conventional programming terms at all; instead, they serve as a kind of dummy— they serve only to link the predicate inside the parentheses to the quantifier outside. For example, consider the simple predicate (actually a proposition):

    EXISTS x ( x > 3 )

This proposition merely asserts that there exists some integer greater than three. (I’m assuming that x here is constrained to “range over” the set of integers. Again, this is a point I’ll come back to in the section "Database Constraints.”) Note, therefore, that the meaning of the proposition would remain totally unchanged if the two x’s were both replaced by some other variable y. In other words, the proposition:

    EXISTS y ( y > 3 )

is semantically identical to the one just shown.

Now consider the predicate:

    EXISTS x ( x > 3 ) AND x < 0

Here there are three x’s, but they don’t all denote the same thing. The first two are bound and can be replaced by, say, y without changing the overall meaning; but the third is free and cannot be replaced with impunity. Thus, of the following two predicates, the first is equivalent to the one just shown and the second is not:

    EXISTS y ( y > 3 ) AND x < 0
    EXISTS y ( y > 3 ) AND y < 0

To close this section, I remark that we can now (re)define a proposition to be a predicate in which all of the variables are bound.

More on Quantification

This section discusses a number of miscellaneous issues regarding quantification.

We Don’t Need Both Quantifiers

You might already know that we don’t actually need both EXISTS and FORALL. The reason is that any statement that can be expressed in terms of EXISTS can always be expressed in terms of FORALL instead and vice versa. By way of example, consider again the predicate:

    EXISTS x ( x is taller than Steve )

(“Somebody is taller than Steve”). Another way to say the same thing is:

    NOT ( FORALL x ( NOT ( x is taller than Steve ) ) )

(“It is not the case that nobody is taller than Steve”). More generally, in fact, the statement:

    EXISTS x ( p ( x ) )

is logically equivalent to the following statement:

    NOT ( FORALL x ( NOT ( p ( x ) ) )

(where the predicate p might legitimately involve other parameters in addition to x). Likewise, the statement:

    FORALL x ( p ( x ) )

is logically equivalent to the statement:

    NOT ( EXISTS x ( NOT ( p ( x ) ) )

(where, again, the predicate p might legitimately involve other parameters in addition to x).

It follows from the foregoing that a formal language need not explicitly support both EXISTS and FORALL. But it’s very desirable to support both in practice. The reason is that some problems are “more naturally” formulated in terms of EXISTS and others are “more naturally” formulated in terms of FORALL. For example, SQL supports EXISTS but not FORALL; as a consequence, certain queries are quite awkward to formulate in SQL. For example, consider the query “Get suppliers who supply all parts,” which can be expressed in logic quite simply as follows:

               s WHERE FORALL p EXISTS sp ( s.SNO = sp.SNO AND sp.PNO = p.PNO )

(“Get suppliers s such that, for all parts p, there exists a shipment sp linking that supplier s to that part p“; the variables s, p, and sp are to be understood as ranging over the current values of relvars S, P, and SP, respectively). In SQL, by contrast, the query looks something like this:

    SELECT s.*
    FROM   S AS s
    WHERE  NOT EXISTS
         ( SELECT p.*
           FROM   P AS p
           WHERE  NOT EXISTS
                ( SELECT sp.*
                  FROM   SP AS sp
                  WHERE  s.SNO = sp.SNO
                  AND    sp.PNO = p.PNO ) )

(“Get suppliers s such that there does not exist a part p such that there does not exist a shipment sp linking that supplier s to that part p“). Well, single negation is bad enough (many users have difficulty with it); double negation, as in this SQL query, is much worse!

Empty Ranges

Consider again the fact that the statements:

    EXISTS x ( p ( x ) )

and:

NOT ( FORALL x ( NOT ( p ( x ) ) )

are logically equivalent. As I’ve indicated several times already, the bound variable x in each of these statements must “range over” some set of permissible values. Suppose now that the set in question is empty (it might, for example, be the set of people over 50 feet tall). Then:

  • The quantifier EXISTS x clearly evaluates to FALSE (because no such x exists).

  • The statement EXISTS x (p(x))—“There exists an x that makes p(x) evaluate to TRUE”—thus evaluates to FALSE as well, regardless of what p(x) happens to be. For example, the statement “There exists a person over 50 feet tall who works for IBM” evaluates to FALSE (unsurprisingly).

  • It follows that the negation NOT EXISTS x (p(x)), which is equivalent to the statement FORALL x (NOT (p(x)), evaluates to TRUE—again, regardless of what p(x) happens to be. For example, the statement “All persons over 50 feet tall don’t work for IBM”—or, more idiomatically, “No person over 50 feet tall works for IBM”—evaluates to TRUE (again unsurprisingly).

  • But if the predicate p(x) is arbitrary, then so is the predicate NOT (p(x)). And therefore we have the following possibly surprising result: the statement FORALL x ( . . . ) evaluates to TRUE if there are no x’s, regardless of what appears inside the parentheses. For example, the statement “All persons over 50 feet tall do work for IBM” also evaluates to TRUE(!)—because, to say it again, there aren’t any persons over 50 feet tall.

One implication of this state of affairs is that certain queries produce a result that you might not expect (if you don’t know logic, that is). For example, the following query:

               s WHERE FORALL p
          ( IF p.COLOR = 'Purple'
            THEN EXISTS sp ( s.SNO = sp.SNO AND sp.PNO = p.PNO ) )

(“Get suppliers who supply all purple parts”) will return all suppliers if there aren’t any purple parts. Exercise: Show an SQL formulation of this query.

Defining EXISTS and FORALL

You might have already realized that EXISTS and FORALL can be defined as an iterated OR and an iterated AND , respectively. I’ll consider EXISTS first. Let p(x) be a predicate with a parameter x and let x range over the set X = {x1, x2, . . . , xn}. Then:

    EXISTS x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate:

    FALSE OR p ( x1 ) OR p ( x2 ) OR ... OR p ( xn )

Observe in particular that this expression evaluates to FALSE if X is empty (as we already know). By way of example, let p(x) be "x has a moon” and let X be the set {Mercury, Venus, Earth, Mars}. Then the predicate EXISTS x (p(x)) becomes “EXISTS x (x has a moon),” and it’s shorthand for:

    FALSE OR ( Mercury has a moon ) OR ( Venus has a moon )
          OR ( Earth has a moon )   OR ( Mars has a moon )

which evaluates to TRUE because (for example) “Mars has a moon” is true. Similarly:

    FORALL x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate:

    TRUE AND p ( x1 ) AND p ( x2 ) AND ... AND p ( xn )

And this expression evaluates to TRUE if X is empty (again, as we already know). By way of example, let p(x) and X be as in the EXISTS example above. Then the predicate FORALL x (p(x)) becomes “FORALL x (x has a moon),” and it’s shorthand for:

    TRUE AND ( Mercury has a moon ) AND ( Venus has a moon )
         AND ( Earth has a moon )   AND ( Mars has a moon )

which evaluates to FALSE because (for example) “Venus has a moon” is false.

As an aside, let me remark that—as the examples clearly demonstrate—defining EXISTS and FORALL as iterated OR and AND, respectively, means that every predicate that involves quantification is equivalent to one that doesn’t. Thus, you might be wondering, not without some justification, just what this business of quantification is really all about. Why all the fuss? The answer is as follows: we can define EXISTS and FORALL as iterated OR and AND only because the sets we have to deal with are—thankfully—always finite (because we’re operating in the realm of computers, and computers in turn are finite, of course). In pure predicate logic, where there’s no such restriction, those definitions aren’t valid.

Perhaps I should add that, even though we’re always dealing with finite sets and EXISTS and FORALL are thus merely shorthand, they’re extremely useful shorthand! For my part, I certainly wouldn’t want to have to formulate queries and the like purely in terms of AND and OR, without being able to use the quantifiers.

Other Kinds of Quantifiers

While it’s true that EXISTS and FORALL are the most important quantifiers in practice, they aren’t the only ones possible. There’s no a priori reason, for example, why we shouldn’t allow quantifiers of the form:

    there exist at least three x's such that

or:

    a majority of x's are such that

or:

    an odd number of x's are such that

(and so on). One fairly important special case is there exists exactly one x such that. I’ll use the keyword UNIQUE for this one. Here are some examples:

    UNIQUE x ( x is taller than Arnold )

Meaning: Exactly one person is taller than Arnold; probably FALSE.

    UNIQUE x ( x has social security number y )

Meaning: Exactly one person has social security number y (y is a parameter, of course). We can’t assign a truth value to this example, because it’s a predicate, not a proposition.

    FORALL y UNIQUE x ( x has social security number y )

Meaning: Everybody has a unique social security number. (I’m assuming here that y ranges over the set of all social security numbers actually assigned, not all possible ones. Incidentally, does this predicate—which is in fact a proposition, of course—evaluate to TRUE?)

As another exercise, what does the following predicate mean?

    FORALL x UNIQUE y ( x has social security number y )

Database Constraints

Now let me show how the ideas I’ve been discussing can be used in the formulation of database constraints, using examples from Chapter 6. Please note that the formulations shown aren’t the only ones possible, in general.

Example 1

Status values must be in the range 1 to 100 inclusive.

    CONSTRAINT C1
       FORALL s ( IF s 
                      S THEN s.STATUS ≥ 0 AND s.STATUS ≤ 100 ) ;

(“For all possible values s, if s is a tuple in relvar S, then the STATUS value within that tuple must be in the specified range”; note the dot-qualified attribute references, and recall from Chapter 5 that "s S” can be read as "s [is] in S”).

Now, I’m sure you can see that many constraints are going to involve a proposition of the general form just illustrated:

    FORALL t ( IF t 
             
            R THEN p )

(where p is some predicate). We can simplify such propositions slightly by defining the following as a shorthand:

    FORALL t 
             
            R ( p )

The bound variable t here is said to range over (the current value of) relvar R, and the expression FORALL t R is said to be a range-coupled quantifier. Using this shorthand, constraint C1 becomes:

    CONSTRAINT C1
       FORALL s 
             S ( s.STATUS ≥ 0 AND s.STATUS ≤ 100 ) ;

In the same kind of way, it’s helpful to define the expression:

    EXISTS t 
             
            R ( p )

to be shorthand for the following:

    EXISTS t ( ( t 
             
            R ) AND ( p ) )

In fact, we can (and I will) simplify matters still further, by allowing the ranges for bound variables to be defined external to the expressions that use those bound variables. For example:

            s RANGES OVER { S }

Now I can reduce constraint C1 to just:[*]

    CONSTRAINT C1
       FORALL s ( s.STATUS ≥ 0 AND s.STATUS ≤ 100 ) ;

For the remainder of this section (and the next), I’ll assume the following range definitions are in effect:

            s RANGES OVER { S }
    x RANGES OVER { S }
    y RANGES OVER { S }
    p RANGES OVER { P }
    sp RANGES OVER { SP }

I’ll also introduce additional bound variables as and when necessary.

Example 2

Suppliers in London must have status 20.

    CONSTRAINT C2
       FORALL s ( IF s.CITY = 'London' THEN s.STATUS = 20 ) ;
Example 3

Supplier numbers are unique.

    CONSTRAINT C3
       FORALL x
                     , y ( IF x.SNO = y.SNO THEN x.SNAME  = y.SNAME
                                      AND  x.STATUS = y.STATUS
                                      AND  x.CITY   = y.CITY ) ;

The quantifier FORALL x,y is shorthand for FORALL x FORALL y.

Example 4

No supplier with status less than 20 can supply part P6.

    CONSTRAINT C4
       FORALL s ( IF s.STATUS < 20 THEN
          FORALL sp ( IF sp.SNO = s.SNO THEN
                         sp.PNO ≠ PNO('P6') ) ) ;
Example 5

Every shipment must have a supplier.

    CONSTRAINT C5
       FORALL sp ( EXISTS s ( sp.SNO = s.SNO ) ) ;
Example 6

No supplier number appears in both relvar LS and relvar NLS.

    CONSTRAINT C6
       FORALL m 
                      LS FORALL n 
                      NLS ( m.SNO ≠ n.SNO ) ;
Example 7

Supplier S1 and part P1 must never be in different cities.

    CONSTRAINT C7
       NOT ( EXISTS s EXISTS p ( s.SNO = SNO('S1') AND
                                 p.PNO = PNO('P1') AND
                                 s.CITY ≠ p.CITY ) ) ;

I’ll show a few more examples (not based on examples in Chapter 6) in order to illustrate some additional points.

Example 8

There must always be at least one supplier.

    CONSTRAINT C8 EXISTS s ( TRUE ) ;
Example 9

All parts are either blue or green, and there exists at least one part.

    CONSTRAINT C9
       FORALL p ( p.COLOR = 'Blue' OR p.COLOR = 'Green' ) AND
       EXISTS p ( TRUE ) ;

Note the use of two distinct bound variables here, both called p.

Example 10

All parts are either blue or green, or there exists at least one red part.

    CONSTRAINT C10
       FORALL p ( p.COLOR = 'Blue' OR p.COLOR = 'Green' ) OR
       EXISTS p ( p.COLOR = 'Red' ) ;

Queries

In this section I’ll show how the calculus notation can be used to formulate queries, using examples from Chapter 5. Again, please note that the formulations shown aren’t the only ones possible, in general.

Example 1

Get suppliers in Paris.

                     s WHERE s.CITY = 'Paris'
Example 2

Get suppliers who supply at least one part.

                     s WHERE EXISTS sp ( sp.SNO = s.SNO )
Example 3

Get all supplier name/status/city combinations.

                     s.SNAME, s.STATUS, s.CITY

We could harmlessly add WHERE TRUE to the foregoing formulation.

Example 4

Get the join of suppliers and shipments.

                     s, sp.PNO, sp.QTY WHERE s.SNO = sp.SNO
Example 5

Get the union of supplier cities and part cities.

                     u RANGES OVER { s.CITY, p.CITY }
    u
                  

Bound variable u is defined to range over the union of supplier cities and part cities.

Example 6

Get suppliers who supply no parts at all.

                     s WHERE NOT ( EXISTS sp ( sp.SNO = s.SNO ) )
Example 7

Get parts with their weight in grams.

                     p, ( p.WEIGHT * 454 ) AS GMWT
Example 8

For each supplier, get the supplier number and a count of the number of parts that supplier supplies.

                     s.SNO, COUNT ( sp WHERE sp.SNO = s.SNO ) AS P_COUNT

Some Equivalences

I’ll finish up this appendix with a few remarks on certain equivalences that might have already occurred to you. First, recall the IS_EMPTY operator, which I introduced in Chapter 5 and made much use of in Chapter 6. Well, if the system supports that operator, there’s no logical need for it to support the quantifiers, thanks to the following equivalences:

    EXISTS x ( p)    NOT ( IS_EMPTY ( x WHERE p ) )

and:

    FORALL x ( p )    IS_EMPTY ( x WHERE NOT ( p) )

In fact, the SQL support for quantifiers, such as it is, is based on exactly the foregoing equivalences. To be specific, SQL supports an EXISTS operator— not really a quantifier as such, because it doesn’t involve any bound variables—that’s defined as follows: the operator invocation EXISTS(tx), where tx is a table expression, returns TRUE if and only if the table t denoted by the expression tx is nonempty.[*] By contrast, SQL doesn’t support a FORALL operator, because (a) we don’t need both EXISTS and FORALL, and (b) more to the point, syntax of the form FORALL(tx) where tx is a table expression really makes no sense. For example, what could an expression like FORALL (SELECT * FROM S) possibly mean?

In a similar fashion, we don’t absolutely need the quantifiers anyway if the system supports the aggregate operator COUNT. This is because of the following equivalences:

    EXISTS x ( p )    COUNT ( x WHERE p ) > 0

and:

    FORALL x ( p )    COUNT ( x WHERE p ) = COUNT ( x )

Now, I’m certainly not a fan of the idea of replacing quantified expressions by expressions involving COUNT invocations—though sometimes we have to, if we’re in a pure algebraic framework—but it would be wrong of me not to mention the possibility.

Summary

In the introduction to this appendix, I said it was my position that database professionals should have at least a nodding acquaintance with the basic concepts of relational calculus (or predicate logic—it comes to the same thing). I’d like to conclude by trying to justify that position.

My basic point is simply that a knowledge of logic helps you think precisely (and in our field, the importance of thinking precisely is surely paramount). In particular, it forces you to appreciate the significance of proper quantification. Natural language is so often imprecise; however, careful consideration of what quantification is needed allows you to pin down the meaning of what can otherwise be very imprecise natural-language statements. By way of example, you might like to meditate on exactly what Abraham Lincoln meant—or might have meant, or thought he might have meant, or might have thought he meant—when he famously said: “You can fool some of the people all of the time, and all of the people some of the time, but you can not fool all of the people all of the time.”

Of course, I’m well aware there are many people who disagree with me here; that is, there are many people who feel ordinary mortals shouldn’t have to grapple with a subject as abstruse as logic seems to be. In effect, they claim that logic is just too difficult for most people to deal with. Now, that claim might be true in general (logic is a big subject), but you don’t need to understand the whole of logic for the purpose at hand; in fact, I doubt whether you need much more than what I’ve covered in this appendix. And the benefits are so huge! I made essentially the same point in a 2004 article—“The Logic of Business Rules,” www.dbdebunk.com (June 2004), on which much of the present appendix is based—and I’d like to quote the concluding remarks from that article here:

Surely it’s worth investing a little effort up front in becoming familiar with [the material in this appendix] in order to avoid the problems associated with [business rules that are stated ambiguously]. Ambiguity in business rules leads to implementation delays at best or implementation errors at worst (possibly both). And such delays and errors certainly have costs associated with them, costs that are likely to outweigh those initial learning costs many times over. In other words, framing business rules properly is a serious matter, and it requires a certain level of technical competence.

As you can see, these remarks are set in the context of business rules specifically, but I think they’re of wider applicability.



[*] SQL allows a simple constraint like this one to be reduced still further by (in effect) eliding the quantifier. To be precise, we can specify a “CHECK constraint” as part of the definition of base table S that looks like this: CONSTRAINT C1 CHECK (STATUS >= 0 AND STATUS <= 100). We can even elide that “CONSTRAINT C1” as well, if we want.

[*] There’s a certain irony here, though. As we saw in Chapter 3, because SQL supports nulls, it’s based on what’s called three-valued logic (instead of the conventional two-valued logic the relational model is based on). In three-valued logic, the existential quantifier can return three different results: TRUE, FALSE, and what’s usually called UNKNOWN (as indeed it is, in SQL). But SQL’s EXISTS operator always returns TRUE or FALSE, never UNKNOWN. As a consequence, (a) SQL’s EXISTS is not a faithful implementation of the existential quantifier of three-valued logic, and (b) once again, SQL queries sometimes return the wrong answer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset