Chapter 12

Miscellaneous SQL Topics

I explained that we are calling the White Paper “Open Government” because you always dispose of the difficult bit in the title.
It does less harm there than on the statute books.

—Sir Humphrey Appleby, in Open Government
(first episode of the BBC TV series Yes Minister, by Antony Jay and Jonathan Lynn, 1981)

This last chapter is something of a potpourri; it discusses a few SQL features and related matters that, for one reason or another, don’t fit very neatly into any of the previous chapters. For purposes of reference, it also gives a simplified BNF grammar for SQL table expressions and boolean expressions.

Also, this is as good a place as any to define two terms you need to watch out for. The terms in question are implementation defined and implementation dependent, and they’re both used heavily in the SQL standard. Here are the definitions:

Definition: An implementation defined feature is one whose semantics can vary from one implementation to another, but do at least have to be specified for any individual implementation. In other words, the implementation is free to decide how it will implement the feature in question, but the result of that decision must be documented. An example is the maximum length of a character string.

Definition: An implementation dependent feature, by contrast, is one whose semantics can vary from one implementation to another and don’t even have to be specified for any individual implementation. In other words, the term effectively means undefined; the implementation is free to decide how it will implement the feature in question, and the result of that decision doesn’t even have to be documented (it might vary from release to release, or even more frequently). An example is the full effect of an ORDER BY clause, if the specifications in that clause fail to specify a total ordering. By way of example, consider the effect of the SQL expression SELECT SNO FROM S ORDER BY CITY on our usual suppliers relation. As noted in Chapter 7, this expression can return the five supplier numbers in any of the following sequences (and which particular sequence you get with any particular product or at any particular time is implementation dependent):

  • S5 , S1 , S4 , S2 , S3

  • S5 , S4 , S1 , S2 , S3

  • S5 , S1 , S4 , S3 , S2

  • S5 , S4 , S1 , S3 , S2

SELECT *

Use of the “SELECT *” form of the SQL SELECT clause is acceptable in situations where the specific columns involved, and their left to right ordering, are both irrelevant—for example, in an EXISTS invocation. In particular, it’s probably acceptable at the outermost level of a SELECT – FROM – WHERE expression in what the standard calls “direct” (i.e., interactive) SQL, or in other words in an interactive query. It can be dangerous in other situations, however, because the meaning of that “*” can change if (e.g.) new columns are added to an existing table. Recommendation: Be on the lookout for such situations and try to avoid them. In particular, don’t use “SELECT *” at the outermost level in a cursor definition—instead, always specify the pertinent columns explicitly, by name. A similar remark applies to view definitions also. (On the other hand, if you adopt the strategy suggested under the discussion of column naming in Chapter 3 of always accessing the database via views—the “operate via views” strategy—then it might be safe to use “SELECT *” wherever you like, other than in the definitions of those views themselves.)

EXPLICIT TABLES

An explicit table in SQL is an expression of the form TABLE T, where T is the name of a base table or view or an “introduced name” (see the discussion of WITH in Chapter 6). It’s logically equivalent to the following:

( SELECT * FROM T )

Here’s a fairly complicated example that makes use of explicit tables (“Get all parts—but if the city is London, show it as Oslo and show the weight as double”):

WITH t1 AS ( SELECT PNO , PNAME , COLOR , WEIGHT , CITY
             FROM   P
             WHERE  CITY = 'London' ) ,
     t2 AS ( SELECT PNO , PNAME , COLOR , WEIGHT , CITY ,
                    2 * WEIGHT AS NEW_WEIGHT , 'Oslo' AS NEW_CITY
             FROM t1 ) ,
     t3 AS ( SELECT PNO , PNAME , COLOR ,
                    NEW_WEIGHT AS WEIGHT , NEW_CITY AS CITY
             FROM   t2 ) ,
     t4 AS ( TABLE P EXCEPT CORRESPONDING TABLE t1 )

TABLE t4 UNION CORRESPONDING TABLE t3

DOT QUALIFICATION

References to column names in SQL can usually be dot qualified by the name of the applicable range variable (see the next section). As you know, however, SQL does allow that qualifier to be omitted in many situations, in which case an implicit qualifier is assumed by default. But:

  • The SQL rules regarding implicit qualification aren’t always easy to understand, especially if the overall table expression involves any nested subqueries or explicit joins.1 As a result, it isn’t always obvious what a particular unqualified name refers to.

  • What’s unambiguous today might be ambiguous tomorrow (e.g., if new columns are added to an existing table).

  • In Chapter 3 I recommended, strongly, that columns that represent the same kind of information be given the same name whenever possible. If that recommendation is followed, then unqualified names will often be ambiguous anyway, and dot qualification will therefore be required.

So a good general rule is: When in doubt, qualify. Unfortunately, however, there are certain contexts in which such qualification isn’t allowed. The contexts in question are, loosely, ones in which the name serves as a reference to the column per se, rather than to the data contained in that column. Here’s a partial list of such contexts (note the last two in particular):

  • A column definition within a base table definition (CREATE TABLE, also ALTER TABLE)

  • A key or foreign key specification

  • The column name commalist, if specified (but it shouldn’t be—see Chapter 8), in CREATE VIEW

  • The column name commalist, if specified (but it usually shouldn’t be—see the next section), following the definition of a range variable

  • The column name commalist in JOIN ... USING

  • The column name commalist, if specified (and it should be—see Chapter 5), on INSERT

  • The left side of a SET assignment on UPDATE

It might help to note that most of the contexts listed above are ones in which no range variable, as such, is available for dot qualification use anyway. The point is, however, that an unsuspecting user might expect to be able to use table names as qualifiers in these contexts,2 on the grounds—I suppose—that SQL often uses table names as if they were range variable names anyway, as explained in the section immediately following.

RANGE VARIABLES

As we saw in Chapter 10, a range variable in the relational model is a variable—a variable in the sense of logic, that is, not the usual programming language sense—that ranges over the set of tuples in some relation (or the set of rows in some table, in SQL terms). In SQL, such variables are defined by means of AS specifications in the context of either a FROM clause or an explicit JOIN (see the BNF grammar, later). Here’s a simple example of the FROM case:

SELECT SX.SNO
FROM   S AS SX
WHERE  SX.STATUS > 15

SX here is a range variable that ranges over table S; in other words, its permitted values are rows of table S. You can think of the SELECT expression overall as being evaluated as follows. First, the range variable takes on one of its permitted values, say the row for supplier S1. Is the status value in that row greater than 15? If it is, then supplier number S1 appears in the result. Next, the range variable moves on to another row of table S, say the row for supplier S2; again, if the status value in that row is greater than 15, then the relevant supplier number appears in the result. And so on, exhaustively, until variable SX has taken on all of its permitted values.

Note: SQL calls a name such as SX in the example a correlation name. However, it doesn’t seem to have a term for the thing that such a name names; certainly there’s no such thing in SQL as a “correlation.” (Note in particular that the term doesn’t necessarily have anything to do with correlated subqueries, which are discussed in the next section.) I prefer the term range variable.3

Incidentally, it’s worth noting that SQL requires SELECT expressions always to be formulated in terms of range variables; if no such variables are specified explicitly, it assumes the existence of implicit ones with the same names as the corresponding tables. For example, the SELECT expression

SELECT SNO
FROM   S
WHERE  STATUS > 15

—arguably a more “natural” SQL formulation of the example discussed above—is treated as shorthand for this expression (note the text in bold italics):

SELECT S.SNO
FROM   S AS S
WHERE  S.STATUS > 15

In this latter formulation, the “S” dot qualifiers and the “S” in the specification “AS S” do not denote table S; rather, they denote a range variable called S that ranges over the table with the same name.4

Now, the BNF grammar defined later in this chapter refers to the items in the commalist in a FROM clause—i.e., the items following the keyword FROM itself—as table specifications.5 The expressions denoting the table operands in an explicit JOIN are also table specifications. So let ts be such a table specification. Then, if the portion of ts consisting of a table expression as such in fact consists of a table subquery (see the next section), then ts must include an associated AS specification—even if the range variable introduced by that AS specification is never explicitly mentioned anywhere else in the overall expression. Here’s a JOIN example:

( SELECT SNO , CITY FROM S ) AS temp1
  NATURAL JOIN
( SELECT PNO , CITY FROM P ) AS temp2

Here’s another example (this one is repeated from Chapter 7):

SELECT PNO , GMWT
FROM ( SELECT PNO , WEIGHT * 454 AS GMWT
       FROM   P ) AS temp
WHERE  GMWT > 7000.0

For interest, here’s this latter example repeated with all implicit qualifiers made explicit:

SELECT temp.PNO , temp.GMWT
FROM ( SELECT P.PNO , P.WEIGHT * 454 AS GMWT
       FROM   P ) AS temp
WHERE  temp.GMWT > 7000.0

Note: A range variable definition in SQL can always optionally include a column name commalist that defines column names for the table the range variable ranges over, as in the following example (see the last two lines):

SELECT temp.SNO , temp.SNAME , temp.STATUS, temp.SCITY ,
       temp.PNO , temp.PNAME , temp.COLOR , temp.WEIGHT , temp.PCITY
FROM ( SELECT *
       FROM   S JOIN P
       ON     S.CITY > P.CITY )
       AS temp
        ( SNO , SNAME , STATUS , SCITY ,
          PNO , PNAME , COLOR , WEIGHT , PCITY )

The introduced column names here (SNO, SNAME, STATUS, SCITY, PNO, PNAME, COLOR, WEIGHT, and PCITY) effectively rename columns SNO, SNAME, STATUS, S.CITY, PNO, PNAME, COLOR, WEIGHT, and P.CITY, respectively (see the explanation of JOIN ... ON in Chapter 6).6 However, it shouldn’t be necessary to introduce column names in this way very often if other recommendations in this book are followed.

Recommendation: Favor the use of explicit range variables, especially in “complex” expressions—they can aid clarity, and sometimes they can save keystrokes.7 Be aware, however, that SQL’s name scoping rules for such variables can be quite hard to understand (but this is true regardless of whether the variables in question are explicit or implicit).

Caveat: As noted in Chapter 10, many SQL texts refer to range variable names, or correlation names, as aliases—sometimes, more specifically, table aliases—and describe them as if they were just alternative names for the tables they range over. Here’s a typical quote: “You couldn’t write this query without using aliases because the table names are identical.” But such a characterization seriously misrepresents the true state of affairs (indeed, it betrays a serious lack of understanding of what’s really going on), and it’s strongly deprecated on that account. Be on your guard against this sloppy manner of speaking (and/or thinking).

SUBQUERIES

A subquery in SQL is a table expression, tx say, enclosed in parentheses; if the table denoted by tx is t, the table denoted by the subquery is t also. Note, however, that (as mentioned in Chapters 1 and 6) the expression tx can’t be an explicit JOIN expression. Thus, for example,

( A NATURAL JOIN B )

isn’t a legal subquery.8 By contrast, the following expression is a legal subquery:

( SELECT * FROM A NATURAL JOIN B )

Subqueries fall into three broad categories, though the syntax is the same in every case. The details, partly repeated from earlier chapters, are as follows:

  • A table subquery is a subquery that’s neither a row subquery nor a scalar subquery.

  • A row subquery is a subquery appearing in a position where a row expression is logically required. Let rsq be such a subquery; then rsq must denote a table with just one row. Let the table in question be t, and let the single row in t be r; then rsq behaves as if it denoted that row r (in other words, t is coerced to r). Note: If rsq doesn’t denote a table with just one row, then (a) if it denotes a table with more than one row, an error is raised; (b) if it denotes a table with no rows at all, then that table is treated as if it contained just one row, where the row in question contains a null in every column position.

  • A scalar subquery is a subquery appearing in a position where a scalar expression is logically required. Let ssq be such a subquery; then ssq must denote a table with just one row and just one column. Let the table in question be t, let the single row in t be r, and let the single value in r be v; then ssq behaves as if it denoted that value v (in other words, t is coerced to r, and then r is coerced to v). Note: If ssq doesn’t denote a table with just one row and just one column, then (a) if it denotes a table with more than one column, an error is raised (probably at compile time); (b) if it denotes a table with one column but more than one row, an error is raised (probably at run time); (c) if it denotes a table with one column and no rows at all, then that table is treated as if it contained just one row, where the row in question contains a single null.

The following examples involve, in order, a table subquery, a row subquery, and a scalar subquery:

SELECT SNO
FROM   S
WHERE  CITY IN
     ( SELECT CITY               /* table subquery */
       FROM   P
       WHERE  COLOR = 'Red' )

UPDATE S
SET  ( STATUS , CITY ) =
     ( SELECT STATUS , CITY      /* row subquery */
       FROM   S
       WHERE  SNO = 'S1' )
WHERE  CITY = 'Paris' ;

SELECT SNO
FROM   S
WHERE  CITY =
     ( SELECT CITY               /* scalar subquery */
       FROM   P
       WHERE  PNO = 'P1' )

Next, a correlated subquery is a special kind of table, row, or scalar subquery; to be specific, it’s a subquery that includes what’s called a “correlated reference” to some “outer” table. In the following example, the parenthesized expression following the keyword IN is a correlated subquery—a correlated table subquery, in fact—because it includes a correlated reference to the outer table S (the query is “Get names of suppliers who supply part P1,” and the correlated reference, viz., S.SNO, appears in the very last line):

SELECT DISTINCT S.SNAME
FROM   S
WHERE  'P1' IN
       ( SELECT PNO             /* correlated table subquery */
         FROM   SP
         WHERE  SP.SNO = S.SNO )

As noted in Chapter 11, correlated subqueries are often contraindicated from a performance point of view, because—conceptually, at any rate—they have to be evaluated once for each row in the outer table instead of just once and for all. (In the example, if the overall expression is evaluated as stated, the subquery will be evaluated n times, where n is the number of rows in table S.) For that reason, it’s a good idea to avoid correlated subqueries if possible. In the case at hand, it’s very easy to reformulate the query to achieve this goal:

SELECT DISTINCT S.SNAME
FROM   S
WHERE  SNO IN
     ( SELECT SNO          /* noncorrelated table subquery */
       FROM   SP
       WHERE  PNO = 'P1' )

Here’s another example showing the use of a correlated subquery (a correlated scalar subquery, in the SELECT clause this time), repeated from the section “Summarization” in Chapter 7:

SELECT S.SNO , ( SELECT COUNT ( PNO )
                 FROM   SP
                 WHERE  SP.SNO = S.SNO ) AS PCT
FROM   S

The query is “For each supplier, get the supplier number and a count of the number of parts supplied by that supplier.” Given the sample values in Fig. 1.1 in Chapter 1, the result looks like this:

┌─────┬─────┐
SNO PCT
├═════┼─────┤
S1     6
S2     2
S3     1
S4     3
S5     0
└─────┴─────┘

Finally, a “lateral” subquery is a special kind of correlated subquery. To be specific, it’s a correlated subquery that (a) appears in a FROM clause specifically and (b) includes a reference to an “outer” table that’s defined by a table specification appearing earlier within that same FROM clause. For example, here’s another possible formulation of the query just illustrated (“For each supplier, get the supplier number and the number of parts supplied by that supplier”):

SELECT S.SNO , temp.PCT
FROM   S , LATERAL ( SELECT COUNT ( PNO ) AS PCT
                     FROM   SP
                     WHERE  SP.SNO = S.SNO ) AS temp

The purpose of the keyword LATERAL is to tell the system that the subquery to which it’s prefixed is correlated with something previously mentioned in the very same FROM clause (in the example, that “lateral” subquery yields exactly one value—namely, the applicable count—for each SNO value in table S).

Now, there’s something going on here that you might be forgiven for finding a bit confusing. The items in a FROM clause are table specifications, and so they denote tables. In the example, though, the particular table specification that begins with the keyword LATERAL—more precisely, what remains of that table specification if the keyword LATERAL is removed—looks more like what might be called a scalar specification, or more precisely a scalar subquery; certainly it could used as such, should the context demand such an interpretation (e.g., in a SELECT clause, as indeed we saw above in the previous formulation of this query). In fact, however, it’s a table subquery. The table it denotes, for a given value of S.SNO, is called temp; that table has just one column, called PCT, and just one row, and hence in fact contains a single scalar value. Then the expression temp.PCT in the SELECT clause causes that scalar value to become the contribution of table temp to the applicable result row (just as the expression S.SNO in that same SELECT clause causes the applicable SNO value to become the contribution of table S to that result row).

Following on from the foregoing rather complicated explanation, I feel bound to add that it’s not exactly clear why “lateral” subqueries are needed anyway. Indeed, as we already know, the foregoing example can easily be reformulated in such a way as to avoid the apparent need for any such thing. Here again is that reformulation:

SELECT S.SNO , ( SELECT COUNT ( PNO )
                 FROM   SP
                 WHERE  SP.SNO = S.SNO ) AS PCT
FROM   S

Briefly, what’s happened here is that the subquery has moved from the FROM clause to the SELECT clause; it still refers to something else in the same clause (S.SNO, to be specific), but now the keyword LATERAL is apparently no longer needed. However, do note what’s happened to the specification AS PCT, which appeared inside the subquery in the LATERAL formulation but has now moved outside (this point was discussed in more detail in an aside in the section “Summarization” in Chapter 7).

Finally: I’ve defined the term subquery; perhaps it’s time to define the term query, too!— even though I’ve used that term ubiquitously throughout previous chapters. So here goes: A query is a retrieval request. In the SQL context, in other words, it’s either a table expression— though such expressions can also be used in contexts other than queries per se—or a statement, such as a SELECT statement in “direct” (i.e., interactive) SQL, that asks for such an expression to be evaluated. Note: The term is sometimes used (though not in this book!) to refer to an update request also. It’s also used on occasion to refer to the natural language version of some retrieval or update request.

“POSSIBLY NONDETERMINISTIC” EXPRESSIONS

As we saw in Chapter 2, an SQL table expression is “possibly nondeterministic” if it might give different results on different evaluations, even if the database hasn’t changed in the interim. Here’s the standard’s own definition:

A <query expression> or <query specification> is possibly nondeterministic if an implementation might, at two different times where the state of the SQL-data is the same, produce results that differ by more than the order of the rows due to General Rules that specify implementation dependent behavior.

Actually this definition is a trifle odd, inasmuch as tables—which is what <query expressions>s and <query specifications>s are supposed to produce—aren’t supposed to have an ordering to their rows anyway. But let’s overlook this detail; the important point is that, as noted in Chapter 2, “possibly nondeterministic” expressions aren’t allowed in integrity constraints,9 a state of affairs that could have serious practical implications if true.

The standard’s rules for labeling a given table expression “possibly nondeterministic” are quite complex, and full details are beyond the scope of the present discussion. However, a table expression tx is certainly considered to be “possibly nondeterministic” if any of the following is true:10

  • tx is a union, intersection, or difference, and the operand tables include a column of type character string.

  • tx is a SELECT expression, the SELECT item commalist in that SELECT expression includes an item (C say) of type character string, and at least one of the following is true:

    1. The SELECT item commalist is preceded by the keyword DISTINCT.

    2. C involves a MAX or MIN invocation.

    3. tx directly includes a GROUP BY clause and C is one of the grouping columns.

  • tx is a SELECT expression that directly includes a HAVING clause and the boolean expression in that HAVING clause includes either (a) a reference to a grouping column of type character string or (b) a MAX or MIN invocation in which the argument is of type character string.

  • tx is a JOIN expression and either or both of the operand expressions is possibly nondeterministic.

Note, however, that these rules are certainly stronger than they need be. For example, suppose that (a) NO PAD applies to the pertinent collation and (b) no two characters from the pertinent character set are “distinct, considered equal” according to that collation. Then, e.g., SELECT MAX(C) FROM T, where column C of table T is of the character string type in question, is surely well defined.

EMPTY SETS

The empty set is the set containing no elements. This concept is both ubiquitous and extremely important in the relational world, but SQL commits a number of errors in connection with it. Unfortunately there isn’t much you can do about most of those errors, but you should at least be aware of them. Here they are (this is probably not a complete list):

  • A VALUES expression isn’t allowed to contain an empty row expression commalist.

  • The SQL “set functions” all return null if their argument is empty (except for COUNT(*) and COUNT, which correctly return zero in such a situation).

  • If a scalar subquery evaluates to an empty table, that empty table is coerced to a null.

  • If a row subquery evaluates to an empty table, that empty table is coerced to a row of all nulls.

  • If the set of grouping columns and the table being grouped are both empty, GROUP BY produces a result containing just one (necessarily empty) group, whereas it should produce a result containing no groups at all.

  • A key can’t be an empty set of columns (nor can a foreign key, a fortiori).

  • A table can’t have an empty heading.

  • A SELECT item commalist can’t be empty.

  • A FROM item commalist can’t be empty.

  • The set of common columns for UNION CORRESPONDING, INTERSECT CORRESPONDING, and EXCEPT CORRESPONDING can’t be empty (though it can be for NATURAL JOIN).

  • A row can’t have an empty set of components.

A SIMPLIFIED BNF GRAMMAR

For purposes of reference, it seems appropriate to close this chapter, and the main part of this book, with a simplified BNF grammar for the standard dialect of SQL—not for the whole of the language, of course, but at least for SQL table expressions and boolean expressions.11 The grammar is deliberately conservative, in that it fails to define as valid certain expressions that are so, according to the SQL standard. (However, I don’t believe it defines as valid any expressions that aren’t so according to that standard.) To be more specific, constructs that I’ve previously advised you not to use—including in particular everything to do with nulls and 3VL—are deliberately omitted; so too are certain somewhat esoteric features (e.g., recursive queries). Also, for reasons explained in Chapter 1, almost all of the syntactic categories in what follows have names that differ somewhat from their counterparts in the standard. The following simplifying abbreviations are used:

exp     for       expression
spec    for       specification

All syntactic categories of the form <... name> are assumed to be <identifier>s and are defined no further here. The category <scalar exp> is also left undefined, though it might help to recall in particular that:

  • A scalar subquery is a legal scalar expression.

  • Most “row expressions” that occur in practice are actually scalar expressions.

  • Boolean expressions are scalar expressions too.

Table Expressions

As you can see, the grammar in this subsection begins with a production for <with exp>, a construct not mentioned (at least, not as such) in the body of the book. I introduce this syntactic category in order to capture (among other things) the fact that join expressions can’t appear without being nested inside some other table expression—but it does mean that the construct referred to throughout earlier parts of the book as a table expression doesn’t directly correspond to anything defined in the grammar! (I mean, there’s no production for a syntactic category called <table exp>.) I apologize if you find this state of affairs confusing, but it’s the kind of thing that always happens when you try to define a grammar for a language that violates orthogonality.12

Note that the productions defined in this subsection agree with the SQL standard in giving INTERSECT higher precedence than UNION; thus, for example, the table expression t1 INTERSECT t2 UNION t3 is understood as (t1 INTERSECT t2) UNION t3 and not as t1 INTERSECT (t2 UNION t3). But it’s probably better always to specify parentheses explicitly in such expressions, anyway.

<with exp>
    ::=   [ <with spec> ] <nonjoin exp>

<with spec>
    ::=   WITH <name intro commalist>

<name intro>
    ::=   <table name> AS <table subquery>

<table subquery>
    ::=   <subquery>

<subquery>
    ::=   ( <nonjoin exp> )

<nonjoin exp>
    ::=   <nonjoin term>
        | <nonjoin exp> UNION [ DISTINCT ]
                              [ CORRESPONDING ] <nonjoin term>
        | <nonjoin exp> EXCEPT [ DISTINCT ]
                               [ CORRESPONDING ] <nonjoin term>

<nonjoin term>
    ::=   <nonjoin primary>
        | <nonjoin term> INTERSECT [ DISTINCT ]
                       [ CORRESPONDING ] <nonjoin primary>

<nonjoin primary>
    ::=   TABLE <table name>
        | <table selector>
        | <select exp>
        | ( <nonjoin exp> )

<table selector>
    ::=   VALUES <row exp commalist>

<row exp>
    ::=   <scalar exp>
        | <row selector>
        | <row subquery>

<row selector>
    ::=   [ ROW ] ( <scalar exp commalist> )

<row subquery>
    ::=   <subquery>

<select exp>
    ::=   SELECT [ DISTINCT ] [ * | <select item commalist> ]
             FROM <table spec commalist>
              [ WHERE <boolean exp> ]
                [ GROUP BY <column name commalist> ]
                  [ HAVING <boolean exp> ]

The <column name commalist> in the GROUP BY clause can optionally be enclosed in parentheses. If it is, then (and only then)—unlike all other commalists mentioned in this grammar—it can also be empty.

<select item>
    ::=   <scalar exp> [ AS <column name> ]
        | <range variable name>.*

<table spec>
    ::=   <table name> [ AS <range variable name> ]
        | [ LATERAL ] <table subquery> AS <range variable name>
        | <join exp>
        | ( <join exp> )

<join exp>
    ::=   <table spec> CROSS JOIN <table spec>
        | <table spec> NATURAL JOIN <table spec>
        | <table spec> JOIN <table spec> ON <boolean exp>
        | <table spec> JOIN <table spec>
                       USING ( <column name commalist> )

Boolean Expressions

Note that the productions defined in this subsection agree with the SQL standard in giving AND higher precedence than OR; thus, for example, the boolean expression c1 AND c2 OR c3 is understood as (c1 AND c2) OR c3 and not as c1 AND (c2 OR c3). But it’s probably better always to specify parentheses explicitly in such expressions, anyway.

<boolean exp>
    ::=   <boolean term>
        | <boolean exp> OR <boolean term>

<boolean term>
    ::=   <boolean factor>
        | <boolean term> AND <boolean factor>

<boolean factor>
    ::=   [ NOT ] <boolean primary>

<boolean primary>
    ::=   <boolean literal>
        | <boolean variable name>
        | <boolean column name>
        | <condition>
        | ( <boolean exp> )

<boolean literal>
    ::=   TRUE | FALSE

<condition>
    ::=   <simple comparison exp>
        | <between exp>
        | <like exp>
        | <in exp>
        | <match exp>
        | <all or any exp>
        | <exists exp>
        | <unique exp>

<simple comparison exp>
    ::=   <row exp> <simple comp op> <row exp>

<simple comp op>
    ::=   = | < | <= | > | >= | <>

<between exp>
    ::=   <row exp> [ NOT ] BETWEEN <row exp> AND <row exp>

<like exp>
    ::=   <scalar exp> [ NOT ] LIKE <scalar exp> [ ESCAPE <scalar exp> ]

The <scalar exp>s must denote character strings. For ESCAPE, that string must be of length one.

<in exp>
    ::=   <row exp> [ NOT ] IN <table subquery>
        | <row exp> [ NOT ] IN ( <row exp commalist> )

<match exp>
    ::=   <row exp> MATCH [ UNIQUE ] <table subquery>

<all or any exp>
    ::=   <row exp> <scalar comp op> <all or any> <table subquery>

<all or any>
    ::=   ALL | ANY | SOME

<exists exp>
    ::=   EXISTS <table subquery>

<unique exp>
    ::=   UNIQUE <table subquery>

EXERCISES

12.1 According to the BNF grammar given in the body of the chapter, which of the following are legal as “stand alone” table expressions—in other words, as “with expressions,” which is to say, table expressions not nested inside another table expression—and which not, syntactically speaking? (A and B are table names, and you can assume the tables they denote satisfy the requirements for the operator in question in each case.)

  1. A NATURAL JOIN B

  2. A INTERSECT B

  3. SELECT * FROM A NATURAL JOIN B

  4. SELECT * FROM A INTERSECT B

  5. SELECT * FROM ( A NATURAL JOIN B )

  6. SELECT * FROM ( A INTERSECT B )

  7. SELECT * FROM ( SELECT * FROM A INTERSECT SELECT * FROM B )

  8. SELECT * FROM ( A NATURAL JOIN B ) AS C

  9. SELECT * FROM ( A INTERSECT B ) AS C

  10. TABLE A NATURAL JOIN TABLE B

  11. TABLE A INTERSECT TABLE B

  12. SELECT * FROM A INTERSECT SELECT * FROM B

  13. ( SELECT * FROM A ) INTERSECT ( SELECT * FROM B )

  14. ( SELECT * FROM A ) AS AA INTERSECT ( SELECT * FROM B ) AS BB

What do you conclude from this exercise? Perhaps I should remind you that, relationally speaking, intersection is a special case of natural join.

12.2 Take another look at the expressions in Exercise 12.1. In which of those expressions would it be syntactically legal to replace A or B or both by “table literals” (i.e., appropriate VALUES expressions)?

12.3 Let X and Y both be of the same character string type and be subject to the same collation; let PAD SPACE apply to that collation (not recommended, of course); and let X and Y have the values '42' and '42 ', respectively (note the trailing space in the second of these). Then we know from Chapter 2 that although X and Y are clearly distinct, the expression X = Y gives TRUE. But what about the expression X LIKE Y?

12.4 Given our usual sample values, what do the following expressions return?

  1. SELECT DISTINCT STATUS
    FROM   S
    WHERE  STATUS BETWEEN 10 AND 30

  2. SELECT DISTINCT CITY
    FROM   S
    WHERE  CITY LIKE 'L%'

  3. SELECT DISTINCT CITY
    FROM   S
    WHERE  CITY BETWEEN 'Paris' AND 'Athens'

12.5 The following is intended to be an SQL expression of type BOOLEAN. Is it legal?

( SELECT CITY FROM S WHERE STATUS < 20 )
  =
( SELECT CITY FROM P WHERE WEIGHT = 14.0 )

12.6 In the body of the chapter I recommended circumspection in the use of asterisk notation in the SELECT clause. For brevity, however, I didn’t always follow my own advice in this respect in earlier chapters. Take a look through those chapters and see if you think any of my uses of the asterisk notation were unsafe.

12.7 Consider any SQL product available to you. Does that product support (a) the UNIQUE operator, (b) explicit tables, (c) lateral subqueries, (d) possibly nondeterministic expressions?

12.8 With regard to possibly nondeterministic expressions, recall that SQL prohibits the use of such expressions in integrity constraints. Take another look at the examples in Chapter 8 (and/or the answers to those exercises). Do any of those examples or answers involve any possibly nondeterministic expressions? If so, what can be done about it?

12.9 Throughout this book I’ve taken the term SQL to refer to the official standard version of that language specifically (though my treatment of the standard has deliberately been a long way from being exhaustive). But every product on the market departs from the standard in various ways, either by omitting some standard features or by introducing proprietary features of its own or (almost certainly in practice) both. Again, consider any SQL product available to you. Identify as many departures from the standard in that product as you can.

ANSWERS

12.1 For convenience I repeat the original expressions (or would-be expressions) below:

  1. A NATURAL JOIN B : Illegal

  2. A INTERSECT B : Illegal

  3. SELECT * FROM A NATURAL JOIN B : Legal

  4. SELECT * FROM A INTERSECT B : Illegal

  5. SELECT * FROM ( A NATURAL JOIN B ) : Legal

  6. SELECT * FROM ( A INTERSECT B ) : Illegal

  7. SELECT * FROM ( SELECT * FROM A INTERSECT SELECT * FROM B ) : Illegal

  8. SELECT * FROM ( A NATURAL JOIN B ) AS C : Illegal

  9. SELECT * FROM ( A INTERSECT B ) AS C : Illegal

  10. TABLE A NATURAL JOIN TABLE B : Illegal

  11. TABLE A INTERSECT TABLE B : Legal

  12. SELECT * FROM A INTERSECT SELECT * FROM B : Legal

  13. ( SELECT * FROM A ) INTERSECT ( SELECT * FROM B ) : Legal

  14. ( SELECT * FROM A ) AS AA INTERSECT ( SELECT * FROM B ) AS BB : Illegal

You were also asked what you conclude from this exercise. One thing I conclude is that the rules are very difficult to remember (to say the least). In particular, SQL expressions involving INTERSECT can’t always be transformed straightforwardly into their NATURAL JOIN counterparts. I remark also that if we replace INTERSECT by NATURAL JOIN in the last two expressions, then the legal one becomes illegal and vice versa! That’s because, believe it or not, the expressions

( SELECT * FROM A )

and

( SELECT * FROM B )

are considered to be subqueries in the context of NATURAL JOIN but not that of INTERSECT. (In other words, a subquery is a SELECT expression enclosed in parentheses, loosely speaking, but a SELECT expression enclosed in parentheses isn’t necessarily a subquery.)

12.2 The effects are as follows: Expression b. was previously illegal but becomes legal; expressions c., e., k., l., and m. were previously legal but become illegal; and the others were all illegal anyway and remain so. What do you conclude from this exercise?

12.3 It gives FALSE. Note, therefore (to spell the point out), it’s possible in SQL for two values to be “equal” and yet not “like” each other! (Lewis Carroll, where are you?)

12.4 Expression a. gives:

┌────────┐
STATUS
├════════┤
     10
     20
     30
└────────┘

(The point here is that BETWEEN is inclusive, not exclusive, and so 10 and 30 are both included in the result. Does this state of affairs accord with your own intuitive understanding of the meaning of between?) Expression b. gives:

┌────────┐
CITY   
├════════┤
London
└────────┘

And expression c. gives:

┌────────┐
CITY   
├════════┤
└────────┘

London isn’t included in the result. The reason is that the expression

y BETWEEN x AND z

is shorthand for

x <= y AND y <= z

The problem here is that the natural language expression “y is between x and z” is symmetric in x and z (i.e., switching x and z has no effect on the meaning), while the same is not true for the SQL expression “y BETWEEN x AND z.” In a nutshell, BETWEEN in SQL doesn’t mean the same as between in natural language.

12.5 First of all, observe that both comparand expressions are subqueries, and they therefore evaluate to tables. Now, those tables both have exactly one column, a fact that can be determined at compile time. What’s more, given our usual sample values, they also both have exactly one row; the subqueries are therefore scalar subqueries, and the overall comparison is thus legal (a double coercion occurs on both sides, and the net effect is that two scalar values are compared). But suppose the WHERE clause in the second subquery had specified 12.0 instead of 14.0. Given our usual sample values, the comparison overall would then no longer be legal (it would fail at run time), because the second subquery would now be a table subquery instead of a scalar one.

12.6 No answer provided.

12.7 No answer provided.

12.8 No answer provided.

12.9 No answer provided.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset