A subquery in SQL is a table expression, tx say, enclosed in parentheses; if the table denoted by tx is t, the table denoted by the subquery is t also. Note, however, that (as mentioned in Chapter 1 and Chapter 6) the expression tx can’t be an explicit JOIN expression. Thus, for example,
(A
NATURAL JOINB
)
isn’t a legal subquery.[169] By contrast, the following expression is a legal subquery:
( SELECT * FROMA
NATURAL JOINB
)
Subqueries fall into three categories (though the syntax is the same in every case). The details, partly repeated from earlier chapters, are as follows:
A table subquery is a subquery that’s neither a row subquery nor a scalar subquery.
A row subquery is a subquery appearing in a position where a row expression is expected. Let rsq be such a subquery; then rsq must denote a table with just one row. Let the table in question be t, and let the single row in t be r; then rsq behaves as if it denoted that row r (in other words, t is coerced to r). Note: If rsq doesn’t denote a table with just one row, then (a) if it denotes a table with n rows (n > 1), an error is raised; (b) if it denotes a table with no rows at all, then that table is treated as if it contained just one row, where the row in question contains a null in every column position.
A scalar subquery is a subquery appearing in a position where a scalar expression is expected. Let ssq be such a subquery; then ssq must denote a table with just one row and just one column. Let the table in question be t, let the single row in t be r, and let the single value in r be v; then ssq behaves as if it denoted that value v (in other words, t is coerced to r, and then r is coerced to v). Note: If ssq doesn’t denote a table with just one row and just one column, then (a) if it denotes a table with m columns (m > 1), an error is raised (probably at compile time); (b) if it denotes a table with one column and n rows (n > 1), an error is raised (probably at run time); (c) if it denotes a table with one column and no rows at all, then that table is treated as if it contained just one row, where the row in question contains a single null.
The following examples involve, in order, a table subquery, a row subquery, and a scalar subquery:
SELECT SNO FROM S WHERE CITY IN ( SELECT CITY/* table subquery */
FROM P WHERE COLOR = 'Red' ) UPDATE S SET ( STATUS , CITY ) = ( SELECT STATUS , CITY/* row subquery */
FROM S WHERE SNO = 'S1' ) WHERE CITY = 'Paris' ; SELECT SNO FROM S WHERE CITY = ( SELECT CITY/* scalar subquery */
FROM P WHERE PNO = 'P1' )
Next, a correlated
subquery is a special kind of (table, row, or scalar) subquery; to be specific, it’s a subquery that includes a reference to some “outer” table. In the following example, the parenthesized expression following the keyword IN is a correlated subquery, because it includes a reference to the outer table S (the query is “Get names of suppliers who supply part P1”):
SELECT DISTINCT S.SNAME
FROM S
WHERE 'P1' IN
( SELECT PNO /* correlated subquery */
FROM SP
WHERE SP.SNO = S.SNO )
As noted in Chapter 11, correlated subqueries are often contraindicated from a performance point of view, because—conceptually, at any rate—they have to be evaluated once for each row in the outer table instead of just once and for all. (In the example, if the overall expression is evaluated as stated, the subquery will be evaluated n times, where n is the number of rows in table S.) For that reason, it’s a good idea to avoid correlated subqueries if possible. In the case at hand, it’s very easy to reformulate the query to achieve this goal:
SELECT DISTINCT S.SNAME
FROM S
WHERE SNO IN
( SELECT SNO /* noncorrelated subquery */
FROM SP
WHERE PNO = 'P1' )
Finally, a “lateral” subquery is a special kind of correlated subquery. To be specific, it’s a correlated subquery that (a) appears in a FROM clause and (b) includes a reference to an “outer” table that’s defined by a table specification appearing earlier in that same FROM clause. For example, consider the query “For each supplier, get the supplier number and the number of parts supplied by that supplier.” Here’s one possible formulation of that query in SQL:
SELECT S.SNO , TEMP.PCT FROM S , LATERAL ( SELECT COUNT ( PNO ) AS PCT FROM SP WHERE SP.SNO = S.SNO ) AS TEMP
The purpose of the keyword LATERAL is to tell the system that the subquery to which it’s prefixed is correlated with something previously mentioned in the very same FROM clause (in the example, that “lateral” subquery yields exactly one value—namely, the applicable count—for each SNO value in table S). Given the sample values in Figure 1-1 in Chapter 1, the result looks like this:
|
|
|
|
|
|
|
|
|
|
|
|
Now, there’s something going on here that you might be forgiven for finding a bit confusing. The items in a FROM clause are table specifications, and so they denote tables. In the example, though, the particular table specification that begins with the keyword LATERAL—more precisely, what remains of that table specification if the keyword LATERAL is removed—looks more like what might be called a scalar specification, or more precisely a scalar subquery; certainly it could used as such, should the context demand such an interpretation (e.g., in a SELECT clause). In fact, however, it’s a table subquery. The table it denotes, for a given value of S.SNO, is called TEMP; that table has just one column, called PCT, and just one row, and hence in fact contains a single scalar value. Then the expression TEMP.PCT in the SELECT clause causes that scalar value to become the contribution of table TEMP to the applicable result row (just as the expression S.SNO in that same SELECT clause causes the applicable SNO value to become the contribution of table S to that result row).
Following on from the foregoing rather complicated explanation, I feel bound to add that it’s not exactly clear why “lateral” subqueries are needed anyway. Certainly the foregoing example can easily be reformulated in such a way as to avoid the “need” (?) for any such thing:
SELECT S.SNO , ( SELECT COUNT ( PNO ) FROM SP WHERE SP.SNO = S.SNO ) AS PCT FROM S
The subquery has moved from the FROM clause to the SELECT clause; it still refers to something else in the same clause (S.SNO, to be specific), but now the keyword LATERAL is no longer needed (?). However, do note what’s happened to the specification AS PCT, which appeared inside the subquery in the LATERAL formulation but has now moved outside (this point is discussed in more detail in an aside in the section SUMMARIZATION in Chapter 7).
Finally: I’ve defined the term subquery; perhaps it’s time to define the term query, too!—even though I’ve used that term ubiquitously throughout previous chapters. So here goes: A query is a retrieval request; in the SQL context, in other words, it’s either a table expression—though such expressions can also be used in contexts other than queries per se—or a statement, such as a SELECT statement in “direct” (i.e., interactive) SQL, that asks for such an expression to be evaluated. Note: The term is sometimes used (though not in this book!) to refer to an update request also. It’s also used to refer to the natural language version of some retrieval or update request.