In Chapter 4, we looked at how data can be retrieved from multiple tables using joins. In this chapter, we discuss how data can also be retrieved from multiple tables by using set operations
. We look at the set operations available in SQL Server 2005. Because not all the SQL set operations are explicitly available in SQL Server 2005, we will also look at the IN
predicate and its negation, NOT..IN
, which are ways around the explicit set operations. In the final section of this chapter, we look at the UNION
operation in relation to the join operation, and how the UNION
operation can be used to get the results of some joins.
A set is a collection of objects. In relational databases, a table can be regarded as a set of rows. Elements in a set do not have to be ordered. In relational databases, rows do not have to be ordered as they are entered or stored. Set operations are used in SQL to retrieve data from multiple sets,
and include a binary union, binary intersection
and binary set difference
. A result set is obtained in SQL from the result of a SELECT
.
A binary union is a set operation on two sets, the result of which contains all the elements of both sets. A binary intersection generates values in common between two sets. And, a binary set difference generates values in one set less those contained in another set.
Three explicit set operations are used in SQL: UNION
, INTERSECT
, and MINUS
(for set difference). SQL Server 2005 allows the explicit use of the UNION
and INTERSECT
operations. Because the MINUS
set operation cannot be explicitly used in SQL Server 2005, we will illustrate the MINUS
operation by using the very common IN
predicate and its negation, NOT..IN
, which enable us to accomplish the same result as using INTERSECT
and MINUS
.
The format of a set statement is as follows:
set OPERATOR
set
where OPERATOR
is a UNION
, INTERSECT
or MINUS
, and where “set” is defined by a SELECT
.
First we will discuss the UNION
operator; the INTERSECT
operator will be discussed later in the chapter.
The following is the syntax for a general form of an UNION
:
SELECT * FROM TableA UNION SELECT * FROM TableB
Set statements allow us to combine two distinct sets of data (two result sets) only if we insure union compatibility, as explained in the next section.
Union compatibility, the commonly used SQL terminology for set compatibility, means that when using set operations, the two sets (in this case, the results of two SELECT
s) being unioned have to have the same number of similar columns and the columns have to have compatible data types. Next we will explain what compatible data types means, and we will return to the issue of “similar” columns in a later section.
So what does “compatible” data types mean? The data types of the columns of the two sets being unioned do not necessarily have to be exactly the same, meaning that they may differ in length and even type, but they have to be “well-matched.” For union compatibility, the three basic data types are numeric, string, and dates. All numeric columns are compatible with one another, all string columns are compatible with one another, and all date columns are compatible with one another. For numbers, SQL will convert integers, floating-point numbers, and decimals into a numeric data type, to make them compatible with one another. So any numeric column (for example, integers) can be unioned with any other numeric column (for example, decimals). Likewise, any fixed-length character column and any variable-length character column will be converted to a character data type, and take on the larger size of the character columns being unioned. Similarly, date columns will be combined to a date data type.
Union compatibility can happen in several ways:
By unioning two tables or views that have identical columns (which implies the same domains as well).
By taking two subsets from a table and combining them.
By using two views from two tables respectively with the columns chosen so that they are compatible.
For the data type precedence rules, refer to the "Data Type Precedence" section in Chapter 6.
In SQL Server 2005, a binary union is performed with the UNION
set operation. A UNION
takes the result sets from two (or more) queries and returns all rows from the results sets as a single result set (removing the duplicates). In this section, we illustrate how a UNION
works; although there are other ways to retrieve this information, we are showing the UNION
alternative.
Suppose that we want to find the names of all students who are computer science (COSC) majors, along with all students who are MATH
majors from the Student
table, we may write the following query that uses the UNION
set operator:
SELECT sname FROM Student WHERE major = 'COSC' UNION SELECT sname FROM Student WHERE major = 'MATH'
The two sets being unioned must have the same number of columns in the result sets of the SELECT
clauses.
While executing the UNION
, SQL first executes the first part of the query:
SELECT sname FROM Student WHERE major = 'COSC'
This part virtually produces the following 10 rows of output:
sname -------------------- Mary Zelda Brenda Lujack Elainie Jake Hillary Brad Alan Jerry (10 row(s) affected)
Then SQL executes the second part of the query:
SELECT sname FROM Student WHERE major = 'MATH'
This part virtually produces the following 7 rows of output:
sname -------------------- Mario Kelly Reva Monica Sadie Stephanie Jake (7 row(s) affected)
SQL then combines the two virtual sets of results (the UNION
operation), which includes throwing out any duplicates (an extra “Jake,” in this case), leaving us with the following 16 rows of output:
sname -------------------- Alan Brad Brenda Elainie Hillary Jake Jerry Kelly Lujack Mario Mary Monica Reva Sadie Stephanie Zelda (16 row(s) affected)
Prior to SQL Server 7, SQL Server always returned the result of a UNION
in sorted order. This was so because the UNION
eliminated duplicate rows using a sorting strategy. The ordering was simply a by-product of the sorting to eliminate duplicates. Newer versions of SQL Server, however, have several alternative strategies available for removing duplicates, so there is no guarantee of any particular order when you use UNION
. If you would like to order the output, you should explicitly use ORDER BY
at the end of your last SELECT
statement.
The maximum number of rows possible when a UNION
is used is the sum of the number of rows in the two result sets (or tables) in the two SELECT
clauses.
Earlier, we mentioned that for a union to be successful, there has to be union compatibility, and the two sets being unioned have to have similar columns. So what does similar columns mean?
If we wrote the earlier UNION
example like this:
SELECT major FROM Student WHERE major = 'COSC' UNION SELECT sname FROM Student WHERE major = 'MATH'
We would get a result set, but would the result set (output) be valid? The answer is no. You are trying to union majors and student names. These are not similar columns (though the data types of the two columns are compatible), and it does not make sense to union two different types of columns. So, before performing a union operation, you have to be very careful that you union like columns, and not “apples and oranges.”
In SQL Server 2005, a group of SELECT
statements can also be used to union constants or variables:
SELECT col1=100, col2=200 UNION SELECT col1=400, col2=500 UNION SELECT col1=100*3, col2=200*3 UNION SELECT 900, 400
This query will produce:
col1 col2 ----------- ----------- 100 200 300 600 400 500 900 400 (4 row(s) affected)
Note that the output here happens to be sorted by the first column.
UNION ALL
works exactly like UNION
, but does not expunge duplicates or sort the results. UNION ALL
is more efficient in execution (because UNION ALL
does not have to expunge the duplicates), and occasionally you may need to keep duplicates (just to keep all occurrences or records), in which case you can use UNION ALL
.
The following is the same query previously shown for UNION
, but using UNION ALL
instead of UNION
:
SELECT sname FROM Student WHERE major = 'COSC' UNION ALL SELECT sname FROM Student WHERE major = 'MATH'
This query results in 17 unsorted rows, including one duplicate, Jake; using UNION
produced 16 rows with no duplicates:
sname -------------------- Mary Zelda Brenda Lujack Elainie Jake Hillary Brad Alan Jerry Mario Kelly Reva Monica Sadie Stephanie Jake (17 row(s) affected)
As has been mentioned earlier, in order to successfully UNION
or UNION ALL
result sets, the result sets being unioned have to have the same number of columns. That is, all queries in a UNION
or UNION ALL
operation must return the same number of columns. But what if all the queries being used in the UNION
or UNION ALL
do not return the same number of columns?
If we want to union two result sets that do not have the same number of columns, we have to use NULL
(or other) values in the column-places as place holders. For example, from our Student_course
database, if we want to union the Course
table and the Prereq
table with all the columns, under normal circumstances, this would not be possible, because the Course
table has four columns and the Prereq
table has only two. Therefore, to perform a UNION ALL
operation, we would have to place NULL
values or some other values in the columns that will be empty, as follows (this example uses NULL
as a place holder):
SELECT c.*, NULL FROM Course c WHERE c.credit_hours = 4 UNION ALL SELECT NULL, p.course_number, NULL, NULL, p.prereq FROM Prereq p
This query produces the following 18 rows of output:
COURSE_NAME COURSE_NUMBER CREDIT_HOURS OFFERING_DEPT -------------------- ------------- ------------ ------------- -------- INTRO TO COMPUTER SC COSC1310 4 COSC NULL DATA STRUCTURES COSC3320 4 COSC NULL ADA - INTRODUCTION COSC5234 4 COSC NULL CALCULUS 1 MATH1501 4 MATH NULL SOCIALISM AND COMMUN POLY4103 4 POLY NULL POLITICS OF CUBA POLY5501 4 POLY NULL NULL ACCT3333 NULL NULL ACCT2220 NULL CHEM3001 NULL NULL CHEM2001 NULL COSC3320 NULL NULL COSC1310 NULL COSC3380 NULL NULL COSC3320 NULL COSC3380 NULL NULL MATH2410 NULL COSC5234 NULL NULL COSC3320 NULL ENGL1011 NULL NULL ENGL1010 NULL ENGL3401 NULL NULL ENGL1011 NULL ENGL3520 NULL NULL ENGL1011 NULL MATH5501 NULL NULL MATH2333 NULL POLY2103 NULL NULL POLY1201 NULL POLY5501 NULL NULL POLY4103 (18 row(s) affected)
We can also use other values (other than NULL
) as placeholders, as shown here:
SELECT c.*, COU_NUM = 'XXXXXXXXXXXX' FROM Course c WHERE c.credit_hours = 4 UNION ALL SELECT 'XXXXXXXXXXXXX', p.course_number, 00000000000, 'XXXXXXXXXXXXX', p.prereq FROM Prereq p
This query gives the same output as the previous query, but this time we have used a series of Xs and 0s as placeholders instead of NULL
(we have 18 rows of output):
COURSE_NAME COURSE_NUMBER CREDIT_HOURS OFFERING_DEPT COU_NUM -------------------- ------------- ------------ ------------- ------------ INTRO TO COMPUTER SC COSC1310 4 COSC XXXXXXXXXXXX DATA STRUCTURES COSC3320 4 COSC XXXXXXXXXXXX ADA - INTRODUCTION COSC5234 4 COSC XXXXXXXXXXXX CALCULUS 1 MATH1501 4 MATH XXXXXXXXXXXX SOCIALISM AND COMMUN POLY4103 4 POLY XXXXXXXXXXXX POLITICS OF CUBA POLY5501 4 POLY XXXXXXXXXXXX XXXXXXXXXXXXX ACCT3333 0 XXXXXXXXXXXXX ACCT2220 XXXXXXXXXXXXX CHEM3001 0 XXXXXXXXXXXXX CHEM2001 XXXXXXXXXXXXX COSC3320 0 XXXXXXXXXXXXX COSC1310 XXXXXXXXXXXXX COSC3380 0 XXXXXXXXXXXXX COSC3320 XXXXXXXXXXXXX COSC3380 0 XXXXXXXXXXXXX MATH2410 XXXXXXXXXXXXX COSC5234 0 XXXXXXXXXXXXX COSC3320 XXXXXXXXXXXXX ENGL1011 0 XXXXXXXXXXXXX ENGL1010 XXXXXXXXXXXXX ENGL3401 0 XXXXXXXXXXXXX ENGL1011 XXXXXXXXXXXXX ENGL3520 0 XXXXXXXXXXXXX ENGL1011 XXXXXXXXXXXXX MATH5501 0 XXXXXXXXXXXXX MATH2333 XXXXXXXXXXXXX POLY2103 0 XXXXXXXXXXXXX POLY1201 XXXXXXXXXXXXX POLY5501 0 XXXXXXXXXXXXX POLY4103 (18 row(s) affected)
NULL
does not have a data type, so it can be used as a placeholder for both numeric and character columns. But when using other values as placeholders, the data types have to match. Hence we used 'XX...'
(in the query with the single quotes) for the character columns, and 000
s (in the query without quotes) for the numeric columns.
Although SQL Server 2005 does not have the MINUS
(difference) operator, it does have an IN
predicate and its negation, the NOT..IN
, which enables us to create differences. Let us look at this predicate from a set point of view. If we find the objects from set A that are not in set B, we have found the difference of set A and B (A - B).
The following is a simple example of an IN
predicate with constants in a SELECT
statement:
SELECT sname, class FROM Student WHERE class IN (3,4)
In this example, IN (3,4)
is called a subquery-set, where (3, 4) is the set in which we are testing membership. This query says: “Find all student names from the Student
table where the class
is in the set (3, 4).” It produces the following 17 rows of output:
sname class -------------------- ------ Mary 4 Kelly 4 Donald 4 Chris 4 Jake 4 Susan 3 Monica 3 Phoebe 3 Holly 4 Rachel 3 Jerry 4 Cramer 3 Harrison 4 Francis 4 Losmith 3 Gus 3 Benny 4 (17 row(s) affected)
The preceding query produces the same output as the following query:
SELECT sname, class FROM Student WHERE class = 3 OR class = 4
In other words, the IN(3,4)
means belonging to either set (3) OR set (4), as shown by the WHERE class = 3 OR class = 4
.
We can expand the IN
predicate’s subquery-set part to be an actual query. For example, consider the following query:
SELECT Student.sname FROM Student WHERE Student.stno IN (SELECT g.student_number FROM Grade_report g WHERE g.grade = 'A')
Note the following about this query:
WHERE Student.stno
references the name of the column in the Student
table.
g.student_number
is the column name in the Grade_report
table.
stno
in the Student
table and student_number
in the Grade_report
table have the same domain.
Note also that you must retrieve the information from the same domains for purposes of union compatibility.
The preceding query produces the following 14 rows of output:
sname -------------------- Lineas Mary Brenda Richard Lujack Donald Lynette Susan Holly Sadie Jessica Steve Cedric Jerry (14 row(s) affected)
You could view the preceding query as a result derived from the intersection of the sets A and B, where set A is the set of student numbers in the student set (from the Student
table) and set B is the set of student numbers in the grade set (from the Grade_report
table) that have As.
To make this command behave like a set operator (as if it were an INTERSECT
operator), you can add the qualifier DISTINCT
to the result set as follows:
SELECT DISTINCT (Student.sname) FROM Student WHERE Student.stno IN (SELECT DISTINCT (g.student_number) FROM Grade_report g WHERE g.grade = 'A')
This query produces the following 14 rows of output:
sname -------------------- Brenda Cedric Donald Holly Jerry Jessica Lineas Lujack Lynette Mary Richard Sadie Steve Susan (14 row(s) affected)
Here, SQL Server 2005 sorts the results for you and does not return duplicates.
From a set point of view, an INTERSECT
means if we find objects from set A that are also in set B (and vice versa), we have found the intersection of sets A and B. SQL Server 2005 has an INTERSECT
operator.
The following query is the previous query written using an INTERSECT
(but we displayed student numbers instead of student names):
SELECT s.stno FROM Student s INTERSECT SELECT g.student_number FROM Grade_report g WHERE g.grade = 'A'
This query gives the following 14 rows of output:
stno ------------- 2 3 8 10 14 20 34 49 123 125 126 127 129 142 (14 row(s) affected)
In this query, we had to display student numbers (stno
) instead of the student names (sname
) because of the set compatibility issue discussed earlier. INTERSECT
is a set operator, so the two sets being intersected have to have the same number of columns and the columns have to have compatible data types.
Another example of the use of the INTERSECT
operator would be, for example, if we wanted to find all the students who had dependents, in which case we could type:
SELECT s.stno FROM Student s INTERSECT SELECT d.pno FROM Dependent d
This query would give the following 19 rows of output:
stno ------------------ 2 10 14 17 20 34 62 123 126 128 132 142 143 144 145 146 147 153 158 (19 row(s) affected)
Though the INTERSECT
operator gives us the right answer, in some ways the IN
as a subquery (discussed earlier) is better to use, because when SQL Server 2005 performs the INTERSECT
, it selects sets based on what is mentioned in the SELECT
statements. So, for example, if we wanted the student names in addition to the student numbers, and we typed:
SELECT s.stno, s.sname FROM Student s INTERSECT SELECT d.pno, relationship FROM Dependent d
The query would not work.
Here we would have to use an IN
with a subquery as discussed earlier:
SELECT s.stno, s.sname FROM Student AS s WHERE (s.stno IN (SELECT pno FROM Dependent AS d))
giving us the following 19 rows of output:
stno sname ---- ------------------ 2 Lineas 10 Richard 14 Lujack 17 Elainie 20 Donald 34 Lynette 62 Monica 123 Holly 126 Jessica 128 Brad 132 George 142 Jerry 143 Cramer 144 Fraiser 145 Harrison 146 Francis 147 Smithly 153 Genevieve 158 Thornton (19 row(s) affected)
The NOT..IN
is really a negated IN
predicate. If you use the NOT..IN
in your query, your query may perform poorly. The reason is that when NOT..IN
is used, no indexing can be used, because the NOT..IN
part of the query has to test the set with all values to find out what is not in the set. For smaller tables, no difference in performance will likely be detected. Nonetheless, we discuss how to use NOT..IN
in this section, to demonstrate the logical negative of the IN
predicate, which will help to complete your overall understanding of the SQL language. Instead of using NOT..IN
, it is often preferable to use NOT EXISTS
or outer join techniques, both of which are discussed later on.
Indexing is discussed in detail in Chapter 11.
Sometimes the NOT..IN
may seem to more easily describe the desired outcome or may be used for a set difference. For a simple example, consider the following query:
SELECT sname, class FROM Student WHERE class IN (1,3,4)
This query produces the following 28 rows of output:
sname class -------------------- ------ Lineas 1 Mary 4 Richard 1 Kelly 4 Lujack 1 Elainie 1 Donald 4 Chris 4 Jake 4 Lynette 1 Susan 3 Monica 3 Hillary 1 Phoebe 3 Holly 4 Steve 1 Brad 1 Rachel 3 George 1 Jerry 4 Cramer 3 Fraiser 1 Harrison 4 Francis 4 Losmith 3 Lindsay 1 Gus 3 Benny 4 (28 row(s) affected)
Contrast the preceding query to the following query:
SELECT sname, class FROM Student WHERE class NOT IN (2)
The output in this case is the same as the preceding output because the Student
table only has classes 1, 2, 3, and 4. If counts (results) did not “add up,” this would show that some value of class
was not 1, 2, 3, or 4.
As another example, suppose that you want the names of students who are not computer science (COSC) or math (MATH
) majors. The query would be:
SELECT sname, major FROM Student WHERE major NOT IN ('COSC','MATH')
which produces the following output (28 rows):
sname major -------------------- ----- Lineas ENGL Ken POLY Romona ENGL Richard ENGL Harley POLY Donald ACCT Chris ACCT Lynette POLY Susan ENGL Bill POLY Phoebe ENGL Holly POLY Jessica POLY Steve ENGL Cedric ENGL Rachel ENGL George POLY Cramer ENGL Fraiser POLY Harrison ACCT Francis ACCT Smithly ENGL Sebastian ACCT Losmith CHEM Genevieve UNKN Lindsay UNKN Gus ART Benny CHEM (28 row(s) affected)
The example output gave all majors other than COSC
and MATH
. But you must be very careful with the NOT..IN
predicate, because if nulls are present in the data, you may get odd answers with NOT..IN
.
As an example, consider the following table called Stumajor
:
name major -------------------- -------------------- Mary Biology Sam Chemistry Alice Art Tom NULL (4 row(s) affected)
The table Stumajor
has not been created for you in the Student_course
database. You have to create it, insert the records shown, and then run the queries that follow.
If you perform the following query:
SELECT * FROM Stumajor WHERE major IN ('Chemistry','Biology')
It produces the following output:
name major -------------------- -------------------- Mary Biology Sam Chemistry (2 row(s) affected)
If you perform the following query:
SELECT * FROM Stumajor WHERE major NOT IN ('Chemistry','Biology')
It produces the following output:
name major -------------------- -------------------- Alice Art (1 row(s) affected)
The value, null, is not equal to anything. You might expect that NOT..IN
would give you <Tom,null>
, but it does not. Why? Because nulls in the selection column (here, major
) are not matched with a NOT..IN
.
A NOT..IN
can also be used in a subquery. For example, assume that we have another table called Instructor
, as shown here:
iname teaches -------------------- -------------------- Richard COSC Subhash MATH Tapan BIOCHEM (3 row(s) affected)
The Instructor
table has not been created for you in the Student_course
database. You have to create it, insert the records shown, and then run the queries that follow.
Now, if we want to find all the departments that do not have instructors, we could type the following query:
SELECT * FROM department_to_major d WHERE d.dcode NOT IN (SELECT dcode FROM department_to_major d, instructor i WHERE d.dcode=i.teaches)
This query produces the following output (6 rows):
Dcode DNAME ----- -------------------- ACCT Accounting ART Art CHEM Chemistry ENGL English POLY Political Science UNKN NULL (6 row(s) affected)
Note that in this case, the NOT..IN
“behaved” correctly and reported the NULL
value for DNAME
!
Because SQL Server 2005 does not support the MINUS
predicate, we will show the set difference operation using a NOT..IN
with two examples.
Suppose that set A is the set of students in classes 2, 3, or 4 and set B is the set of students in class 2. We could use the NOT..IN
predicate to remove the students in set B from set A (a difference operation) by typing the following query:
SELECT sname, class FROM Student WHERE class IN (2,3,4) AND NOT class IN (2)
which produces the following output (17 rows):
sname class -------------------- ------ Mary 4 Kelly 4 Donald 4 Chris 4 Jake 4 Susan 3 Monica 3 Phoebe 3 Holly 4 Rachel 3 Jerry 4 Cramer 3 Harrison 4 Francis 4 Losmith 3 Gus 3 Benny 4 (17 row(s) affected)
To illustrate another difference operation, we will use views with the NOT..IN
to give the effect of a difference operation. Suppose for example, you wanted to find the names of those students who do not major in COSC
or MATH
but delete from that set those students who have made an A in some course.
First, using the NOT..IN
, we will create a view (view1
) of the names and majors of the students who are not COSC
or MATH
majors using the following query:
CREATE VIEW view1 AS SELECT sname, major FROM Student WHERE major NOT IN ('COSC', 'MATH')
View1
will have the same 28 rows of output as shown earlier in this chapter.
Then, using the IN
predicate, we will create another view (view2
) of names and majors of students who have received As using the following query:
CREATE VIEW view2 AS SELECT Student.sname, Student.major FROM Student WHERE Student.stno IN (SELECT g.student_number FROM Grade_report g WHERE g.grade = 'A')
If we type:
SELECT * FROM view2;
We get the following 14 rows of output:
sname major -------------------- ----- Lineas ENGL Mary COSC Brenda COSC Richard ENGL Lujack COSC Donald ACCT Lynette POLY Susan ENGL Holly POLY Sadie MATH Jessica POLY Steve ENGL Cedric ENGL Jerry COSC (14 row(s) affected)
Then, to find those students who are not majoring in COSC
or MATH
, and remove from that set those who made an A in some course, the difference operation could be approached using the NOT..IN
as follows, using the views created earlier:
SELECT sname FROM view1 WHERE sname NOT IN (SELECT sname FROM view2)
This query produces the following output (19 rows):
sname -------------------- Ken Romona Harley Chris Bill Phoebe Rachel George Cramer Fraiser Harrison Francis Smithly Sebastian Losmith Genevieve Lindsay Gus Benny (19 row(s) affected)
This query has the same effect as view1--view2
(all students who are not majoring in COSC
or MATH
, MINUS
students who made an A in some course).
In Chapter 4, we discussed joins. In this section, we discuss some differences between the two operations, the UNION
and the JOIN
. Although the UNION
operation and the JOIN
operation are similar in that they both combine two tables or sets of data, the approaches used by the two operations are different. We will first present an example of when a JOIN
may be used versus when a UNION
may be used, and then we will present some other differences between the UNION
and the JOIN
.
A JOIN
is very commonly used in queries. As we discussed previously (in Chapter 4), JOIN
s (specifically, equi-joins) involve a result set created based on tables where the tables are linked via some common column. The UNION
operator is mostly used to combine two sets of information where the genesis of the information is not as straightforward as in a join. Consider the following two examples.
Suppose that we wanted to find the names of students who took accounting courses. This is a straightforward join example. This type of query would involve joining the Student
, Section
, and Course
tables and selecting the student names from the result set. In this case though, we actually have to join the Student
table to the Grade_report
table first, and then join that result to the Section
table, because we cannot directly join the Student
table to the Section
table. Then, we join that combined result to the Course
table—so this ends up becoming a four-table join, with the Grade_report
table acting like a bridge between Student
and Section
. The JOIN
query would be:
SELECT DISTINCT(sname) FROM Course c JOIN (Section se JOIN (Student s JOIN Grade_report g ON s.stno = g.student_number) ON se.section_id = g.section_id) ON c.course_number = se.course_num AND c.course_name LIKE 'ACC%'
This query would give the following 20 rows of output:
sname --------------- Alan Bill Brad Brenda Cedric Chris Donald Hillary Holly Jessica Kelly Ken Mario Monica Phoebe Romona Sadie Steve Susan Zelda (20 row(s) affected)
Note that we had to use a DISTINCT
in the previous query, as the result of a JOIN
gives duplicates.
This example query could also be answered using subqueries, which are discussed later, but the point is that it is easy to see the relationship between the three (actually four) tables.
Suppose that we wanted to find something like the names of the students who take accounting courses and combine them with the names of students who also major in subjects that use overhead projectors in the courses they take. This could be done using a join with this database, but it would involve finding a join-path through most of the database. For a much larger database, it might be very impractical to consider such a large join. It would be easier to first find the set of names of students who take accounting courses (call this set A) and then find students who major in subjects that use projectors (set B), then union sets A and B. The UNION
approach allows us to simplify the problem and check intermediate results, so we will present this problem using a UNION
. Further, each part of the problem can be done with joins or subqueries as needed for efficiency and then the results finally unioned. Set operations allow us to create sets of results any way we can and then combine the result sets using set operations; UNION
is a set operation.
Following, we present the UNION
approach to doing this query. The first step is to do the parts individually. That is, first find the set of names of students who take accounting courses (this is the first half of the query before the UNION
). Once this is done, then do the second part individually; that is, find the students who major in subjects that use projectors. Once you have the result for both parts, UNION
the two results. We will not need the DISTINCT
here, as UNION
does not keep the duplicates. Here is a query that shows this approach:
SELECT sname FROM Course c JOIN (Section se JOIN (Student s JOIN Grade_report g ON s.stno = g.student_number) ON se.section_id = g.section_id) ON c.course_number = se.course_num AND c.course_name LIKE 'ACC%' UNION SELECT sname FROM Student s JOIN (Department_to_major d JOIN (Course c JOIN (Room r JOIN Section se ON r.room = se.room) ON se.course_num = c.course_number) ON c.offering_dept = d.dcode) ON s.major = d.dcode AND r.ohead = 'Y'
This query produces 30 rows:
sname -------------------- Alan Bill Brad Brenda Cedric Chris Cramer Donald Elainie Hillary Holly Jake Jerry Jessica Kelly Ken Lineas Lujack Mario Mary Monica Phoebe Rachel Richard Romona Sadie Smithly Steve Susan Zelda (30 row(s) affected)
In this section, we summarize our JOIN
/UNION
discussion with three abstract tables containing three rows each of symbolic data. Relations or tables are sets of rows.
We will first show the union. Assume that we have the following two tables.
Table A
ColumnA |
ColumnB |
ColumnC |
X1 |
Y1 |
Z1 |
X2 |
Y2 |
Z2 |
X3 |
Y3 |
Z3 |
Table B
ColumnA |
ColumnB |
ColumnC |
X4 |
Y4 |
Z4 |
X5 |
Y5 |
Z5 |
X6 |
Y6 |
Z6 |
A SQL UNION
can be shown would be:
SELECT * FROM TableA UNION SELECT * FROM TableB
which produces the following table as a result:
Table C
ColumnA |
ColumnB |
ColumnC |
X1 |
Y1 |
Z1 |
X2 |
Y2 |
Z2 |
X3 |
Y3 |
Z3 |
X4 |
Y4 |
Z4 |
X5 |
Y5 |
Z5 |
X6 |
Y6 |
Z6 |
Using a similar set of diagrams, the join operation could be shown as follows with the following two tables (joining TableA
and TableD
into TableE
):
Table A
ColumnA |
ColumnB |
ColumnC |
X1 |
Y1 |
Z1 |
X2 |
Y2 |
Z2 |
X3 |
Y3 |
Z3 |
Table D
ColumnA |
ColumnD |
ColumnE |
X1 |
D1 |
E1 |
X2 |
D2 |
E2 |
X3 |
D3 |
E3 |
Now, a SQL JOIN
would be:
SELECT * FROM TableA a JOIN TableD d ON a.ColumnA = d.ColumnA
Giving the following table:
Table E
TableA.ColumnA |
TableA.ColumnB |
TableA.ColumnC |
TableB.ColumnA |
TableB.ColumnD |
TableB.ColumnE |
X1 |
Y1 |
Z1 |
X1 |
D1 |
E1 |
X2 |
Y2 |
Z2 |
X2 |
D2 |
E2 |
X3 |
Y3 |
Z3 |
X3 |
D3 |
E3 |
Following are the major differences between UNION
s and JOIN
s:
In a UNION
, all the rows in the resulting tables (sets) being unioned have to be compatible; in a JOIN
, only the joining columns of the tables being joined have to be compatible—the other columns may be different.
In a UNION
, no “new” columns can be added to the final result of the UNION
; in a JOIN
, new columns can be added to the result of the JOIN
.
In a UNION
, the number of columns in the result set has to be the same as the number of columns in the sets being unioned; in a JOIN
, the number of columns in the result set may vary.
In Chapter 4, you read that the outer join adds rows to the result set that would otherwise be dropped from an inner join of both tables due to the join condition. Remember that an inner join (also known as an equi-join, ordinary join or regular join) combines two tables by finding common values on some column(s) common to the two tables. In an outer join, we are saying, “we want all the rows from one table and only the joined rows from the other.” In SQL Server 2005, the outer joins are in two classes—left and right, depending on how the query is written. A full outer join means that we want all rows from both tables being joined, and “fill in those rows where a join does not produce a result with nulls.” In SQL Server 2005, a UNION
can also be used to achieve this full outer join.
Some SQL languages do not directly support the full outer join, but SQL Server 2005 directly supports it.
In SQL Server 2005, you can create a full outer join by writing a union of the left outer join and the right outer join, like this:
SELECT with right outer join UNION SELECT with left outer join
The order of the left outer join and the right outer join does not matter and can be reversed. To illustrate the workings of the UNION
version of the full outer join, let us again use the table called Instructor
, created earlier in this chapter:
iname teaches -------------------- -------------------- Richard COSC Subhash MATH Tapan BIOCHEM
If we want to get a listing of all instructors and the names of the departments for which they teach (which will be done by a regular equi-join) plus a listing of the rest of the instructors, regardless of whether they belong to a department, plus a listing of the rest of the departments, regardless of whether they have instructors, we would write the following query to achieve the full outer join effect with a UNION
:
SELECT * FROM Department_to_major AS d LEFT JOIN Instructor AS I ON d.dcode=i.teaches UNION SELECT * FROM Department_to_major AS d RIGHT JOIN Instructor AS I ON d.dcode=i.teaches
This query produces the following output (9 rows):
Dcode DNAME iname teaches ----- -------------------- -------------------- -------------------- NULL NULL Tapan BIOCHEM ACCT Accounting NULL NULL ART Art NULL NULL CHEM Chemistry NULL NULL COSC Computer Science Richard COSC ENGL English NULL NULL MATH Mathematics Subhash MATH POLY Political Science NULL NULL UNKN NULL NULL (9 row(s) affected)
First, the LEFT JOIN
was done, outer joining the department_to_major
table and the Instructor
table (so that all the rows of the department_to_major
table were added to the result set). Then, a RIGHT JOIN
was done, again joining the department_to_major
table to the Instructor
table (but this time all the rows of the Instructor
table were added to the result set). Finally, a UNION
of the two results sets was performed, creating the effect of a full outer join (where the rows from both the tables were added back after the join).
In this chapter, we discussed the set operators available in SQL Server 2005. After reading this chapter, you should have an appreciation of how and when to use UNION
s and INTERSECT
s, and how to handle the difference problem, although SQL Server 2005 does not have an explicit MINUS
operator. Oftentimes queries can be approached in more than one way. In several places, we also showed how the same queries could also be approached without the use of set operators.
What are the major differences between the UNION
operation and the JOIN
operation?
What is the major difference between the UNION
and the UNION ALL
?
What major set operator does SQL Server 2005 not have? How can these problems be resolved?
What does union compatibility mean?
What data types are union-compatible?
What is the maximum number of rows that can result from a UNION
of two tables—one with 5 rows and the other with 6 rows?
What is the maximum number of rows that can result from a JOIN
of two tables—one with 5 rows and the other with 6 rows?
How can a UNION
be used to implement an outer join? Explain.
Does SQL Server 2005 support the MINUS
operation? How can this be resolved? Give examples.
What is a full outer join? Does SQL Server 2005 directly support a full outer join?
Do you need the same number of columns to perform a union?
Do you need the same data types to perform a union?
Do you need the same number of columns to perform a join?
From the examples given in the chapter, what does the UNION JOIN
appear to do?
If a VARCHAR
column were unioned with a CHAR
column, what would the resulting column be? (Hint: refer to the "Data Type Precedence" section in Chapter 6.)
What does set compatibility mean?
What is the maximum number of rows that can result from a INTERSECT
of two tables—one with 5 rows and the other with 6 rows?
Do you need the same number of columns to perform an INTERSECT
operation?
Do you need the same data types to perform an INTERSECT
operation?
Unless specified otherwise, use the Student_course
database to answer the following questions. Also, use appropriate column headings when displaying your output.
In this exercise, you’ll test the UNION
statement. Having seen how the UNION
statement works, demonstrate some permutations to see what will work “legally” and what won’t. First, create two tables as follows:
Table 1 | |
A |
B |
x1 |
y1 |
r1 |
s1 |
Table 2 | |||
A |
B |
C |
D |
x2 |
y2 |
z2 |
w2 |
r2 |
s2 |
t2 |
u2 |
Make the type of As and Bs CHAR(2)
. Let the type of C in Table2
be VARCHAR(2)
and D in Table2
be VARCHAR(3)
.
Try the following statements and note the results:
SELECT * FROM Table1 UNION SELECT * FROM Table2 SELECT * FROM Table1 UNION SELECT A,B FROM Table2 SELECT * FROM Table1 UNION SELECT B,A FROM Table1 SELECT * FROM Table1 UNION SELECT A,C FROM Table2 SELECT * FROM Table1 UNION SELECT A,D FROM Table2 CREATE VIEW viewx AS SELECT A,B FROM Table2 SELECT * FROM Table1 UNION SELECT * FROM viewx
Feel free to experiment with any other combinations that you deem appropriate or that you wonder about.
Create and print the result of a query that generates the names, class, and course numbers of students who have earned Bs in computer science courses. Store this query as Q7_2. Then, revise Q7_2 to delete from the result set those students who are sophomores (class = 2). Use NOT..IN
to select those students who are sophomores.
Find the names, grades, and course numbers of students who have earned As in computer science or math courses. Join the Section
and Grade_report
tables (be careful to not create the Cartesian product). Then, UNION
the set of “course numbers COSC% and A” with the set of “course number MATH% and A.”
Hint: Start with the query to get names, grades, and course numbers for COSC% and A, and then turn this into a view. Do the same for MATH% and A, and then execute the UNION
statement as follows (using your view names):
SELECT * FROM view1a UNION SELECT * FROM view1b
Find the names and majors of students who have made a C in any course. Make the “who have made a C in any course” a subquery for which you use IN
.
A less-obvious example of a difference query is to find a difference that is not based on simple, easy-to-get sets. Suppose that set A is the set of student names who have made As and Bs in computer science (COSC) courses. Suppose further that set B is the set of students who have taken math courses (regardless of what grade they earned).
Then, set A minus set B would contain names of students who have made As or Bs in computer science courses, less those who have taken math courses. Similarly, set B minus set A would be the set of students who took math courses, less those who took COSC courses and made an A or a B in some COSC course.
Build these queries into set difference queries as views based on student numbers and execute them, as follows:
Write a query that gives the student number, name, course, and grade for each set. Save each query as Q7_5a and Q7_5b.
Reconstruct each query into a view of just student numbers, verify that it works, and then create views to create set A and set B. Verify that you have the same number of tuples in set A as you have in Q7_5a, and the same number of tuples in set B as you have in Q7_5b.
Display the student numbers of students in each set difference—show (set A minus set B) and (set B minus set A). Look at the original queries, Q7_5a and Q7_5b, to verify your result.
Create two tables, T1
and T2
, that contain a name
and a salary
column. In the first table, order the columns by name
, and then by salary
. In the second table, order the columns by salary
, and then by name
. Use the same data types for each - VARCHAR(20)
, NUMBER
, for example. Populate the tables with two tuples each.
Can you UNION
the two tables in the preceding question with the following query?
SELECT * FROM T1 UNION SELECT * FROM T2
Why or why not? If not, can you force the union of the two tables? Illustrate how. Be sure to DROP
the tables when you are finished.
Using the Instructor
table you created in this chapter (as well as the tables supplied in the Student_course
database), find the following (use the UNION
or INTERSECT
operator if you feel it is appropriate):
All departments that have instructors. First do this using an IN
predicate, and then using a regular join.
Find all students who are also instructors.
Find all instructors who are not students.
Find all students who are not instructors.
Find all students as well as instructors.
Using the Student
table, find all the students who major in math and are seniors. Hint: Use the INTERSECT
operator for this.
De Morgan's Theorem.In the binary case, DeMorgan’s Theorem tells us that [not(A and B)] = [not(A) or not(B)]
. For example, suppose that A is the set of rows where students are juniors and B is the set of rows where students are females. And suppose that you were asked the question, “Find the students who are not (female and juniors).” Clearly this is the set [not(A and B)]
. You can answer this question by finding the set of students who are not juniors [not(A)]
and then or-ing this with the set of students who are not females [not(B)]
. At times it is easier to find one or the other of the results via a query, and the point here is that the two methods of finding a result is equivalent.
Question: Find the result set for all sections that are offered in building 13 and call this set A. Find the result set for all sections that are offered in building 36 and call this set B. Construct the SQL to find the following result sets:
The result of set A OR
set B (use WHERE
building = 13 or building = 36).
The result of the complement of (a): NOT
(set A OR
set B).
The result of NOT(set A) AND NOT(set B)
.
The count of all rows in the Section
table.
Is the count in d = a + b? Is the result of c the same as the result of b? Explain why or why not in each case (Hint: You may apply the De Morgan’s Theorem which states that NOT(set A or set B) = NOT(set A)
and NOT(set b)
.