A correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly. In this chapter, we discuss correlated subqueries
in detail. We discuss existence queries (EXISTS
) and correlation as well as NOT EXISTS
. We also take a look at SQL’s universal and existential qualifiers. Before discussing correlated subqueries in detail however, let’s make sure that you understand what constitutes a noncorrelated subquery.
A noncorrelated subquery is a subquery that is independent of the outer query. In other words, the subquery could be executed on its own. The following is an example of a query that is not correlated:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'A')
The first part of the preceding query (the first three lines) is the main, outer query, and the second part (the part in parentheses) is the subquery (also referred to as an inner, nested, or embedded query). To demonstrate that this subquery is an independent entity, you could run it by itself:
SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'A'
which would produce the following output (17 rows):
student_number -------------- 2 3 8 8 10 14 20 129 142 129 34 49 123 125 126 127 142 (17 row(s) affected)
The preceding subquery is thought of as being evaluated first, creating the set of student numbers who have As. Then, the subquery’s result set is used to determine which rows (tuples) in the main query will be SELECT
ed. So, the full query results in the following output (14 rows):
sname -------------------- Lineas Mary Brenda Richard Lujack Donald Lynette Susan Holly Sadie Jessica Steve Cedric Jerry (14 row(s) affected)
As stated at the beginning of the chapter, a correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly.
Correlated subqueries present a different execution scenario to the database manipulation language (DML) than do ordinary, noncorrelated subqueries. The correlated subquery cannot stand alone, as it depends on the outer query; therefore, completing the subquery prior to execution of the outer query is not an option. The efficiency of the correlated subquery varies; it may be worthwhile to test the efficiency of correlated subqueries versus joins or sets.
One situation in which you cannot avoid correlation is the “for all” query, which is discussed later in this chapter.
To illustrate how a correlated subquery works, the following is an example of the non-correlated subquery from the previous section revised as a correlated subquery:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.student_number = s.stno AND gr.grade = 'A')
This query produces the following output (14 rows), which is the same as the output of the noncorrelated subquery (shown earlier):
sname -------------------- Lineas Mary Brenda Richard Lujack Donald Lynette Susan Holly Sadie Jessica Steve Cedric Jerry (14 row(s) affected)
In this example, the inner query (the part in parentheses) references the outer one—observe the use of s.stno
in the WHERE
clause of the inner query. Rather than thinking of this query as creating a set of student numbers that have As, each row from the outer query can be considered to be SELECT
ed individually and tested against all rows of the inner query one at a time until it is determined whether a given student number is in the inner set and whether that student earned an A.
This query was illustrated with and without correlation. You might think that a correlated subquery is less efficient than doing a simple subquery, because the simple subquery is done once, whereas the correlated subquery is done once for each outer row. However, the internal handling of how the query executes depends on the SQL and the optimizer for that database engine.
The correlated subquery acts like a nested DO
loop in a programming language, where the first row from the Student
table is SELECT
ed and tested against all the rows in the Grade_report
table, and then the second Student
row is SELECT
ed and tested against all rows in the Grade_report
table. The following is the DO
loop in pseudocode:
LOOP1: For each row in Student s DO LOOP2: For each row in Grade_report gr DO IF (gr.student_number = s.stno) THEN IF (gr.grade = 'B') THEN TRUE END LOOP2; IF TRUE, THEN Student row is SELECTed END LOOP1
Correlated queries are often written so that the question in the inner query is one of existence. For example, suppose you want to find the names of students who have taken a computer science (COSC
) class and have earned a grade of B in that course. This query can be written in several ways. For example, you can write it as a noncorrelated subquery as follows:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.grade = 'B')
This query produces the following output (17 rows):
sname -------------------- Lineas Mary Brenda Lujack Reva Harley Chris Lynette Hillary Phoebe Holly George Cramer Fraiser Francis Lindsay Stephanie (17 row(s) affected)
You can think of this query as first forming the set of student numbers of students who have made Bs in COSC
courses—the inner query result set. In the inner query, you must have both the Grade_report
table (for the grades) and the Section
table (for the course numbers). Once you form this set of student numbers (by completing the inner query), the outer query looks through the Student
table and SELECT
s only those students who are in the inner query set.
This query could also be done by creating a double-nested subquery containing two IN
s, or it could be written using a three-table join.
Had we chosen to write the query with an unnecessary correlation, it might look like this:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.student_number = s.stno AND gr.grade = 'B')
The output of this query would be the same as the previous query. In this case, the use of the Student
table in the subquery is unnecessary. Although correlation is unnecessary, this example is included to show the following:
When correlation is necessary
How to untangle unnecessarily correlated queries
How you might migrate your thought process toward correlation, should it be necessary
First, let’s look at situations in which the correlation of a subquery is necessary, and introduce a new predicate: EXISTS
.
In situations in which the correlation of a subquery is necessary, you can write the correlated subquery with the EXISTS
predicate, which looks like this:
SELECT s.sname FROM Student s WHERE EXISTS (SELECT 1 FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.student_number = s.stno AND gr.grade = 'B')
The output of this query would be the same as the output (17 rows) of both of the previous queries.
Let’s dissect this query. The EXISTS
predicate says, “Choose the row from the Student
table in the outer query if the subquery is true (that is, if a row in the subquery exists that satisfies the condition in the subquery WHERE
clause).” Because no actual result set is formed, "SELECT
1” is used as a “dummy” result set to indicate that the subquery is true (1 is returned) or false (no rows are returned). In the noncorrelated case, we tied the student number in the Student
table to the inner query by the IN
predicate as follows:
SELECT s.stno FROM Student s WHERE s.stno IN (SELECT "student number ...)
When using the EXISTS
predicate, we do not use any column of the Student
table, but rather are seeking only to find whether the subquery WHERE
can be satisfied.
We have indicated that we are using EXISTS
with (SELECT 1...
). Using the EXISTS
predicate, the subquery does not form a result set per se, but rather causes EXISTS
to returns true or false. The use of SELECT *
in the inner query is common among SQL programmers. However, from an “internal” standpoint, SELECT *
causes the SQL engine to check the data dictionary unnecessarily. As the actual result of the inner query is not important, it is strongly suggested that you use SELECT 'X'
(or SELECT 1 ...
) instead of SELECT * ...
so that a constant is SELECT
ed instead of some “sensible” entry. The SELECT 'X' ..
or SELECT 1 ...
is simply more efficient.
In the EXISTS
case, we do not specify any columns to be SELECT
ed in the inner query’s result set; rather, we use a dummy result--SELECT 'X'
(or we could use SELECT 1
). If the subquery WHERE
is satisfied, it returns true
, and if the inner query is not satisfied, it selects nothing, then the subquery returns false
. The EXISTS
predicate forces us to correlate the query. To illustrate that correlation is usually necessary with EXISTS
, consider the following query:
SELECT s.sname FROM Student s WHERE EXISTS (SELECT 'X' FROM Grade_report gr, Section t WHERE t.section_id = gr.section_id AND t.course_num LIKE 'COSC%' AND gr.grade = 'B')
This query produces 48 rows of output (of which we show the first 20 rows):
sname -------------------- Lineas Mary Zelda Ken Mario Brenda Romona Richard Kelly Lujack Reva Elainie Harley Donald Chris Jake Lynette Susan Monica Bill. . . (48 row(s) affected)
This query uses EXISTS
, but has no correlation. This syntax infers that for each student row, we test the joined Grade_report
and Section
tables to see whether there is a course number like COSC
and a grade of B (which, of course, there is). We unnecessarily ask the subquery question over and over again. The result from this latter, uncorrelated EXISTS
query is the same as the following:
SELECT s.sname FROM Student s
The point is that the correlation is usually necessary when we use EXISTS
.
Consider another example in which a correlation could be used. Suppose that we want to find the names of all students who have three or more Bs. A first pass at a query might be something like this:
SELECT s.sname FROM Student s WHERE "something" IN (SELECT "something" FROM Grade_report WHERE "count of grade = 'B'" > 2)
This query can be done with a HAVING
clause, as you saw previously (Chapter 9), but we want to show how to do this in yet another way. Suppose we arrange the subquery to use the student number (stno
) from the Student
table as a filter and count in the subquery only when a row in the Grade_report
table correlates to that student. The query (this time with an implied EXISTS
) looks like this:
SELECT s.sname FROM Student s WHERE 2 < (SELECT COUNT(*) FROM Grade_report gr WHERE gr.student_number = s.stno AND gr.grade = 'B')
which results in the following output (8 rows):
sname -------------------- Lineas Mary Lujack Reva Chris Hillary Phoebe Holly (8 row(s) affected)
Although there is no EXISTS
in this query, it is implied. The syntax of the query does not allow an EXISTS
, but the sense of the query is "WHERE EXISTS
a COUNT
of 2 which is less than...” In this correlated subquery, we have to examine the Grade_report
table for each member of the Student
table to see whether the student has more than two Bs. We test the entire Grade_report
table for each student row in the outer query.
If it were possible, a subquery without the correlation would be more desirable, because it would appear simpler to understand. The overall query might be as follows:
SELECT s.sname FROM Student s WHERE s.stno IN (subquery that defines a set of students who have made 3 Bs)
Therefore, we might attempt to write the following query:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'B')
However, as the following output (27 rows) shows, this query would give us only students who earned at least one B:
sname -------------------- Lineas Mary Zelda Ken Mario Brenda Kelly Lujack Reva Harley Chris Lynette Hillary Phoebe Holly Sadie Jessica Steve Cedric George Cramer Fraiser Francis Smithly Sebastian Lindsay Stephanie (27 row(s) affected)
To get a list of students who have earned at least three Bs, we could try the following query:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number, COUNT(*) FROM Grade_report gr WHERE gr.grade = 'B' GROUP BY gr.student_number HAVING COUNT(*) > 2)
However, this approach does not work, because the subquery cannot have two columns in its result set unless the main query has two columns in the WHERE .. IN
.
Here, the subquery must have only gr.student_number
to match s.stno
. So, we might try to construct an inline view, as shown in the following query:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT vi.student_number FROM (SELECT student_number, ct = COUNT(*) FROM Grade_report gr WHERE gr.grade = 'B' GROUP BY student_number HAVING COUNT(*) > 2) AS vi)
This is an example of the inline view, discussed in Chapter 6. This query succeeds in SQL Server 2005, producing the following output (8 rows):
sname -------------------- Lineas Mary Lujack Reva Chris Hillary Phoebe Holly (8 row(s) affected)
As you can see, several ways exist to query the database with SQL. In this case, the correlated subquery may be the easiest to see and perhaps the most efficient.
A simple example of converting from IN
to EXISTS
--uncorrelated to correlated (or vice versa)--would be to move the set test in the WHERE .. IN
of the uncorrelated subquery to the WHERE
of the EXISTS
in the correlated query.
As an example, consider the following uncorrelated subquery:
SELECT * FROM Student s WHERE s.stno IN (SELECT g.student_number FROM Grade_report g WHERE grade = 'B')
The following is the same query written as a correlated subquery:
SELECT * FROM Student s WHERE EXISTS (SELECT g.student_number FROM Grade_report g WHERE grade = 'B' AND s.stno = g.student_number)
This query produces 27 rows of output (of which we show the first 15 rows):
STNO SNAME MAJOR CLASS BDATE ------ -------------------- ----- ------ ----------------------- 2 Lineas ENGL 1 1980-04-15 00:00:00 3 Mary COSC 4 1978-07-16 00:00:00 5 Zelda COSC NULL 1978-02-12 00:00:00 6 Ken POLY NULL 1980-07-15 00:00:00 7 Mario MATH NULL 1980-08-12 00:00:00 8 Brenda COSC 2 1977-08-13 00:00:00 13 Kelly MATH 4 1980-08-12 00:00:00 14 Lujack COSC 1 1977-02-12 00:00:00 15 Reva MATH 2 1980-06-10 00:00:00 19 Harley POLY 2 1981-04-16 00:00:00 24 Chris ACCT 4 1978-02-12 00:00:00 34 Lynette POLY 1 1981-07-16 00:00:00 121 Hillary COSC 1 1977-07-16 00:00:00 122 Phoebe ENGL 3 1980-04-15 00:00:00 123 Holly POLY 4 1981-01-15 00:00:00. . . (27 row(s) affected)
This example gives you a pattern to move from one kind of query to the other kind and to test the efficiency of both kinds of queries. Both of the preceding queries should produce the same output.
As with the IN
predicate, which has a NOT IN
compliment, EXISTS
may also be used with NOT
. In some situations, the predicates EXISTS
and NOT EXISTS
are vital. For example, if we ask a “for all” question, it must be answered by “existence"--actually, the lack thereof (that is, “not existence”). In logic, the statement, “find x for all y” is logically equivalent to “do not find x where there does not exist a y.” Or, there is no x for no y. Or, you cannot find an x when there is no y.
In SQL, there is no “for all” predicate. Instead, SQL uses the idea of “for all” logic with NOT EXISTS
. (A word of caution, however—SQL is not simply a logic exercise, as you will see.) In this section, we look at how EXISTS
and NOT EXISTS
work in SQL. In the following section, we address the “for all” problem.
Consider the following query:
SELECT s.sname FROM Student s WHERE EXISTS (SELECT 'X' FROM Grade_report gr WHERE s.stno = gr.student_number AND gr.grade = 'C')
which produces the following output (24 rows):
sname -------------------- Zelda Ken Mario Brenda Richard Reva Donald Jake Susan Monica Bill Sadie Jessica Steve Alan Rachel Smithly Sebastian Losmith Genevieve Thornton Gus Benny Lionel (24 row(s) affected)
For this correlated subquery, “student names” are SELECT
ed when:
The student is enrolled in a section (WHERE s.stno = gr.student_number
)
The same student has a grade of C (note the correlation in the WHERE
clause in the inner query)
Both statements must be true for the student row to be SELECT
ed. Recall that we use SELECT 1
or SELECT 'X'
in our inner query, because we want the subquery to return something if the subquery is true. The actual value of the “something” does not matter. true
means something is returned; false
means nothing was returned from the subquery. Therefore, SELECT .. EXISTS
“says” SELECT .. WHERE true
. The inner query is true if any row is SELECT
ed in the inner query.
Now consider the preceding query with a NOT EXISTS
in it instead of EXISTS
for students who do not have a grade of C:
SELECT s.sname FROM Student s WHERE NOT EXISTS (SELECT 'X' FROM Grade_report gr WHERE s.stno = gr.student_number AND gr.grade = 'C')
This query produces the following output (24 rows):
sname -------------------- Lineas Mary Romona Kelly Lujack Elainie Harley Chris Lynette Smith Hillary Phoebe Holly Brad Cedric George Jerry Cramer Fraiser Harrison Francis Lindsay Stephanie Jake (24 row(s) affected)
In this query, we are still SELECT
ing with the pattern SELECT .. WHERE true
because all SELECT
s with EXISTS
work that way. But, the twist is that the subquery has to be false to be SELECT
ed with NOT EXISTS
. If the subquery is false, then NOT EXISTS
is true and the outer row is SELECT
ed.
Now, logic implies that if either s.stno <> gr.student_number
or gr.grade <> 'C'
, then the subquery “fails"--that is, it is false for that student row. As the subquery is false, the NOT EXISTS
would return a true
for that row. Unfortunately, this logic is not quite what happens. Recall that we characterized the correlated subquery as follows:
LOOP1: For each row in Student s DO LOOP2: For each row in Grade_report DO IF (gr.student_number = s.stno) THEN IF (gr.grade = 'C') THEN TRUE END LOOP2; IF TRUE, THEN student row is SELECTed END LOOP1
Note that LOOP2
is completed before the next student is tested. In other words, just because a student number exists that is not equal, it will not cause the subquery to be false. Rather, the entire subquery table is parsed and the logic is more like this:
For the case .. WHERE EXISTS s.stno = gr.student_number ...
, is there a gr.grade = 'C'
? If, when the student numbers are equal, no C can be found, then the subquery returns no rows—it is false for that student row. So, with NOT EXISTS
, we will SELECT
students who have student numbers equal in the Grade_report
and Student
tables, but who have no C in the Grade_report
table. The point about “no C in the Grade_report
table” can be answered true
only by looking at all the rows in the inner query and finding no C for that student.
In SQL, “for all” and “for each” are the universal qualifiers, whereas “there exists” is the existential qualifier. As mentioned in the preceding section, SQL does not have a “for all” predicate; however, logically, the following relationship exists:
For all x, WHERE P(x) is true ... |
which is logically the same as the following:
There does not exist an x, WHERE P(x) is not true. |
A “for all” type SQL query is less straightforward than the other queries we have used, because it involves a double-nested, correlated subquery using the NOT EXISTS
predicate. The next section shows an example.
To show a “for all” type SQL query, we will use another table in our Student_course
database—a table called Cap
(for “capability”). This table has names of students who have multiple foreign-language capabilities. We begin by looking at the table by typing the following query:
SELECT * FROM Cap ORDER BY name
This query produces the following output (18 rows):
NAME LANGU --------- ------- BRENDA FRENCH BRENDA CHINESE BRENDA SPANISH JOE CHINESE KENT CHINESE LUJACK SPANISH LUJACK FRENCH LUJACK GERMAN LUJACK CHINESE MARY JO FRENCH MARY JO GERMAN MARY JO CHINESE MELANIE FRENCH MELANIE CHINESE RICHARD SPANISH RICHARD FRENCH RICHARD CHINESE RICHARD GERMAN (18 row(s) affected)
Suppose that we want to find out which languages are spoken by all students (for which we would ask the question, “For each language, does it occur with all students?”). Although this manual exercise would be very difficult for a large table, for our practice table, we can answer the question by displaying and manually counting in the table ordered by language.
To see how to answer a question of the type--"Which languages are spoken by all students?"--for a much larger table where sorting and examining the result would be tedious, we will construct a query. After showing the query, we will dissect the result. Following is the query to answer our question:
SELECT name, langu FROM Cap x WHERE NOT EXISTS (SELECT 'X' FROM Cap y WHERE NOT EXISTS (SELECT 'X' FROM Cap z WHERE x.langu = z.langu AND y.name = z.name))
As you will see, all the for all/for each questions follow this double-nested, correlated NOT EXISTS
pattern.
This query produces the following output (7 rows):
name langu --------- ------- BRENDA CHINESE RICHARD CHINESE LUJACK CHINESE MARY JO CHINESE MELANIE CHINESE JOE CHINESE KENT CHINESE (7 row(s) affected)
To SELECT
a “language” spoken by all students, the query proceeds as follows:
SELECT
a row in Cap (x)
(outer query).
For that row, begin SELECT
ing each row again in Cap (y)
(middle query).
For each of the middle query rows, we want the inner query (Cap z
) to be true for all cases of the middle query (remember that true
is translated to false
by the NOT EXISTS
). As each inner query is satisfied (it is true
), it forces the middle query to continue looking for a match—to look at all cases and eventually conclude false
(evaluate to false
overall). If the middle query is false
, the outer query sees true
because of its NOT EXISTS
.
To make the middle query (y)
find false
, all the inner query (z)
occurrences must be true
; that is, the languages from the outer query must exist with all names from the middle one (y
) in the inner one (z
). For an eventual “match,” every row in the middle query for an outer query row must be false
(that is, every row in the inner query is true
).
These steps are explained in further detail in the next example, in which we use a smaller table, so that the explanation is easier to understand.
Suppose that we have the simpler table Cap1
(see Table 10-1) when attempting to answer the question “Which languages are spoken by all students?”
The table Cap1
does not exist in the Student_course
database. You will have to create it. Keep the column names and types similar to the table Cap
.
The query will be similar to the one used in the previous section:
SELECT name, language FROM Cap1 x WHERE NOT EXISTS (SELECT 'X' FROM Cap1 y WHERE NOT EXISTS (SELECT 'X' FROM Cap1 z WHERE x.language = z. language AND y.name = z.name)) ORDER BY language
This query produces the following output:
name language ------------ ------------ Joe Hindi Mary Hind (2 row(s) affected)
The following is a step-by-step explanation of how this query would work in Table 10-1 (Cap1):
The row <Joe, Hindi> is SELECT
ed by the outer query (x)
.
The row <Joe, Hindi> is SELECT
ed by the middle query (y)
.
The row <Joe, Hindi> is SELECT
ed by the inner query (z)
.
The inner query is true:
X.LANGUAGE = Hindi Z.LANGUAGE = Hindi Y.NAME = Joe Z.NAME = Joe
Because the inner query returns a row (is true
), the NOT EXISTS
of the middle query translates this to false
and continues with the next row in the middle query. The middle query SELECT
s <Mary, Hindi> and the inner query begins again with <Joe, Hindi> seeing:
X.LANGUAGE = Hindi Z.LANGUAGE = Hindi Y.NAME = Mary Z.NAME = Joe
This is false
, so the inner query SELECT
s a second row <Mary, Hindi>:
X.LANGUAGE = Hindi Z.LANGUAGE = Hindi Y.NAME = Mary Z.NAME = Mary
This is true
, so the inner query is true
. (Notice that the X.LANGUAGE
has not changed yet; the outer query [X]
is still on the first row.)
Because the inner query returns a row (is true
), the NOT EXISTS
of the middle query translates this to false
and continues with the next row in the middle query.
The middle query now SELECT
s <Mary, French> and the inner query begins again with <Joe, Hindi> seeing:
X.LANGUAGE = Hindi Z.LANGUAGE = Hindi Y.NAME = Mary Z.NAME = Joe
This is false
, so the inner query SELECT
s a second row <Mary, Hindi>:
X.LANGUAGE = Hindi Z.LANGUAGE = Hindi Y.NAME = Mary Z.NAME = Mary
This is true
, so the inner query is true
.
Because the inner query is true
, the NOT EXISTS
of the middle query again converts this true
to false
and wants to continue, but the middle query is out of rows. Thus the middle query is false
.
Because the middle query is false
, and because we are testing
"SELECT distinct name, language FROM Cap1 x WHERE NOT EXISTS (SELECT 'X' FROM Cap1 y ...",
the false
from the middle query is translated to true
for the outer query and the row <Joe, Hindi> is SELECT
ed for the result set. Note that “Hindi” occurs with both “Joe” and “Mary.”
The second row in the outer query will repeat the previous steps for <Mary, Hindi>. The value “Hindi” will be seen to occur with both “Joe” and “Mary” as <Mary, Hindi> is added to the result set.
The third row in the outer query begins with <Mary, French>. The middle query SELECT
s <Joe, Hindi> and the inner query SELECT
s <Joe, Hindi>. The inner query sees the following:
X.LANGUAGE = French Z.LANGUAGE = Hindi Y.NAME = Joe Z.NAME = Mary
This is false
, so the inner query SELECT
s a second row, <Mary, Hindi>:
X.LANGUAGE = French Z.LANGUAGE = Hindi Y.NAME = Joe Z.NAME = Mary
This is false
, so the inner query SELECT
s a third row, <Mary, French>:
X.LANGUAGE = French Z.LANGUAGE = French Y.NAME = Joe Z.NAME = Mary
This is also false
. The inner query returns no rows (fails). The inner query evaluates to false
, which causes the middle query to returns rows (see true
) because of the NOT EXISTS
. Because the middle query sees true
, it is finished and evaluated to true
. Because the middle query evaluates to true
, the NOT EXISTS
in the outer query changes this to false
and X.LANGUAGE = French
fails because X.LANGUAGE = French
did not occur with all the values of NAME
.
Consider again the “for all” query presented in Example 2:
SELECT name, language FROM Cap1 x WHERE NOT EXISTS (SELECT 'X' FROM Cap1 y WHERE NOT EXISTS (SELECT 'X' FROM Cap1 z WHERE x.language = z. language AND y.name = z.name)) ORDER BY language
A clue as to what a query of this kind means can be found in the inner query where the outer query is tested. In the phrase that says WHERE
x.language
=
z. language
...
, the x.language
is where the query is testing which language occurs for all names.
This query is a SQL realization of a relational division exercise. Relational division is a “for all” operation just like that illustrated earlier. In relational algebra, the query must be set up into a divisor, dividend, and quotient in this pattern:
Quotient (B) ← Dividend(A, B) divided by Divisor (A). |
If the question is “What language for all names?” then the Divisor, A, is names
, and the Quotient, B, is language
. It is most prudent to set up SQL like relational algebra with a two-column table (like Cap
or Cap1
) for the Dividend and then treat the Divisor and the Quotient appropriately. Our query will have the column for language, x.language
, in the inner query, as language
will be the quotient. We have chosen to also report name
in the result set.
Note that the preceding query is completely different from the following query, which asks, “Which students speak all languages?”:
SELECT DISTINCT name, language FROM Cap1 x WHERE NOT EXISTS (SELECT 'X' FROM Cap1 y WHERE NOT EXISTS (SELECT 'X' FROM Cap1 z WHERE y.language = z.language AND x.name = z.name)) ORDER BY language
This query produces the following output:
name language ------------ ------------ Mary French Mary Hindi (2 row(s) affected)
Note that the inner query contains x.name
, which means the question was “Which names occur for all languages?” or, put another way, “Which students speak all languages?” The “all” goes with languages for x.name
.
In this chapter, we discussed the correlated subquery, noncorrelated subquery, EXISTS
, and NOT EXISTS
. We described situations where the correlation of a subquery is necessary and can be written with the EXISTS
predicate, and other times when EXISTS
can be used, even with no correlation. We also introduced loops and discussed how the “for all” and “for each” are used in SQL.
What is a noncorrelated subquery?
Which type of subquery can be executed on its own?
Which part of a query is evaluated first, the query or the subquery?
What are correlated subqueries?
What does the EXISTS
predicate do?
What are considered universal qualifiers?
Is correlation necessary when we use EXISTS
? Why?
Explain how the “for all” type SQL query involves a double-nested correlated subquery using the NOT EXISTS
predicate.
Unless specified otherwise, use the Student_course
database to answer the following questions. Also, use appropriate column headings when displaying your output.
List the names of students who have received Cs. Do this in three ways: (a) as a join, (b) as an uncorrelated subquery, and (c) as a correlated subquery. Show both results and account for any differences.
In section "Existence Queries and Correlation,” you were asked to find the names of students who have taken a computer science class and earned a grade of B. We noted that it could be done in several ways. One query could look like this:
SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC___ _' AND gr.grade = 'B')
Redo this query, putting the finding of the COSC
course in a correlated subquery. The query should be as follows:
The Student
table uncorrelated subquery to the Grade_report
table, correlated EXISTS
to the Section
table.
In the section "SQL Universal and Existential Qualifiers,” we illustrated both an existence query:
SELECT s.sname FROM Student s WHERE EXISTS (SELECT 'X' FROM Grade_report gr WHERE Student.stno = gr.student_number AND gr.grade = 'C')
and a NOT EXISTS
version:
SELECT s.sname FROM Student s WHERE NOT EXISTS (SELECT 'X' FROM Grade_report gr WHERE Student.stno = gr.student_number AND gr.grade = 'C')
Show that the EXISTS
version is the complement of the NOT EXISTS
version—count the rows in the EXISTS
result, the rows in the NOT EXISTS
result, and the rows in the Student
table. Also, devise a query to give the same result with IN
and NOT..IN
.
Discover whether all students take courses by counting the students, and then count those students whose student numbers are in the Grade_report
table and those whose student numbers are not in the table. Use IN
and then NOT..IN
, and then use EXISTS
and NOT EXISTS
. How many students take courses and how many students do not?
Find out which students have taken courses but who have not taken COSC
courses. Create a set of student names and courses from the Student
, Grade_report
, and Section
tables (use the prefix COSC
to indicate computer science courses). Then, use NOT..IN
to “subtract” from that set another set of student names of students (who take courses) who have taken COSC
courses. For this set difference, use NOT..IN
.
Change NOT..IN
to NOT EXISTS
(with other appropriate changes) and explain the result. The “other appropriate changes” include adding the correlation and the change of the result column in the subquery set.
There exists a table called Plants
. List the table and then find out what company or companies have plants in all cities. Verify your result manually.
Run the following query and print the result:
SELECT distinct name, langu FROM Cap x WHERE NOT EXISTS (SELECT 'X' FROM Cap y WHERE NOT EXISTS (SELECT 'X' FROM Cap z WHERE X.langu =Z.langu AND Y.name=Z.name))
Save the query (e.g., save forall) and hand in the result.
Recreate the Cap
table (e.g., call it some other name, such as LANG1
). To do this, first create the table and then use the INSERT
statement with the sub select option (INSERT INTO LANG1 AS SELECT * FROM Cap
).
Add a new person to your table who speaks only BENG
.
Recall your previous SELECT
(get for all).
CHANGE
the table from CAP
to LANG1
(for all occurrences, use CHANGE
/Cap
/lang1
/ repeatedly, assuming that you called your table LANG1
).
Start the new query (the one you just created with LANG1
in it).
How is this result different from the situation in which Newperson
was not in LANG1
? Provide an explanation of why the query did what it did.
The Department_to_major
table is a list of four-letter department codes with the department names. In Chapter 8, Exercise 7 (hereafter referred to as Exercise 8-7), you created a table called Secretary
, which should now have data like this:
Secretary |
|
dCode |
Name |
ACCT |
Beryl |
COSC |
Kaitlyn |
ENGL |
David |
HIST |
Christina |
BENG |
Fred |
Null |
Brenda |
In Exercise 8-7, you did the following:
Create a query that lists the names of departments that have secretaries (use IN
and the Secretary
table in a subquery). Save this query as q8_7a.
Create a query that lists the names of departments that do not have secretaries (use NOT..IN
). Save this query as q8_7b.
Add one more row to the Secretary
table that contains <null,'Brenda'>. (This could be a situation in which you have hired Brenda but have not yet assigned her to a department.)
Recall q8_7a and rerun it.
Recall q8_7b and rerun it.
We remarked in Exercise 8-7 that the NOT..IN
predicate has problems with nulls: the behavior of NOT..IN
when nulls exist may surprise you. If nulls may exist in the subquery, then NOT..IN
should not be used. If you use NOT..IN
in a subquery, you must ensure that nulls will not occur in the subquery or you must use some other predicate, such as NOT EXISTS
. Perhaps the best advice is to avoid NOT..IN
.
Here, we repeat Exercise 8-7 using NOT EXISTS
:
Reword query q8_7a to use EXISTS
. You will have to correlate the inner and outer queries. Save this query as q10_7aa.
Reword query q8_7b to use NOT EXISTS
. You will have to correlate the inner and outer queries. Save this query as q10_7bb. You should not have a phrase IS NOT NULL
in your NOT EXISTS
query.
Rerun q8_9a with and without <null, Brenda>.
Rerun q8_9b with and without <null, Brenda>.
Note the difference in behavior versus the original question. List the names of those departments that do or do not have secretaries. The point here is to encourage you to use NOT EXISTS
in a correlated subquery, rather than NOT..IN
.