7. Set Operations

Introducing Set Operations

A set is a collection of objects. In relational databases, a table can be regarded as a set of rows. Elements in a set do not have to be ordered. In relational databases, rows do not have to be ordered as they are entered or stored. Set operations are used in SQL to retrieve data from multiple sets, and include a binary union, binary intersection and binary set difference . A result set is obtained in SQL from the result of a SELECT.

A binary union is a set operation on two sets, the result of which contains all the elements of both sets. A binary intersection generates values in common between two sets. And, a binary set difference generates values in one set less those contained in another set.

Three explicit set operations are used in SQL: UNION, INTERSECT, and MINUS (for set difference). SQL Server 2005 allows the explicit use of the UNION and INTERSECT operations. Because the MINUS set operation cannot be explicitly used in SQL Server 2005, we will illustrate the MINUS operation by using the very common IN predicate and its negation, NOT..IN, which enable us to accomplish the same result as using INTERSECT and MINUS.

The format of a set statement is as follows:

   set OPERATOR set

where OPERATOR is a UNION, INTERSECT or MINUS, and where “set” is defined by a SELECT.

First we will discuss the UNION operator; the INTERSECT operator will be discussed later in the chapter.

The following is the syntax for a general form of an UNION:

    SELECT *
    FROM   TableA
    UNION
    SELECT *
    FROM TableB

Set statements allow us to combine two distinct sets of data (two result sets) only if we insure union compatibility, as explained in the next section.

Union Compatibility

Union compatibility, the commonly used SQL terminology for set compatibility, means that when using set operations, the two sets (in this case, the results of two SELECTs) being unioned have to have the same number of similar columns and the columns have to have compatible data types. Next we will explain what compatible data types means, and we will return to the issue of “similar” columns in a later section.

So what does “compatible” data types mean? The data types of the columns of the two sets being unioned do not necessarily have to be exactly the same, meaning that they may differ in length and even type, but they have to be “well-matched.” For union compatibility, the three basic data types are numeric, string, and dates. All numeric columns are compatible with one another, all string columns are compatible with one another, and all date columns are compatible with one another. For numbers, SQL will convert integers, floating-point numbers, and decimals into a numeric data type, to make them compatible with one another. So any numeric column (for example, integers) can be unioned with any other numeric column (for example, decimals). Likewise, any fixed-length character column and any variable-length character column will be converted to a character data type, and take on the larger size of the character columns being unioned. Similarly, date columns will be combined to a date data type.

Tip

For union compatibility, the three basic data types are numeric, string, and dates.

Union compatibility can happen in several ways:

By unioning two tables or views that have identical columns (which implies the same domains as well).
By taking two subsets from a table and combining them.
By using two views from two tables respectively with the columns chosen so that they are compatible.

Tip

For the data type precedence rules, refer to the "Data Type Precedence" section in Chapter 6.

The UNION Operation

In SQL Server 2005, a binary union is performed with the UNION set operation. A UNION takes the result sets from two (or more) queries and returns all rows from the results sets as a single result set (removing the duplicates). In this section, we illustrate how a UNION works; although there are other ways to retrieve this information, we are showing the UNION alternative.

Suppose that we want to find the names of all students who are computer science (COSC) majors, along with all students who are MATH majors from the Student table, we may write the following query that uses the UNION set operator:

    SELECT sname
    FROM Student
    WHERE major = 'COSC'
      UNION
    SELECT sname
    FROM Student
    WHERE major = 'MATH'

Tip

The two sets being unioned must have the same number of columns in the result sets of the SELECT clauses.

While executing the UNION, SQL first executes the first part of the query:

    SELECT sname
    FROM Student
    WHERE major = 'COSC'

This part virtually produces the following 10 rows of output:

    sname
    --------------------
    Mary
    Zelda
    Brenda
    Lujack
    Elainie
    Jake
    Hillary
    Brad
    Alan
    Jerry

    (10 row(s) affected)

Then SQL executes the second part of the query:

    SELECT sname
    FROM Student
    WHERE major = 'MATH'

This part virtually produces the following 7 rows of output:

    sname
    --------------------
    Mario
    Kelly
    Reva
    Monica
    Sadie
    Stephanie
    Jake

    (7 row(s) affected)

SQL then combines the two virtual sets of results (the UNION operation), which includes throwing out any duplicates (an extra “Jake,” in this case), leaving us with the following 16 rows of output:

    sname
    --------------------
    Alan
    Brad
    Brenda
    Elainie
    Hillary
    Jake
    Jerry
    Kelly
    Lujack
    Mario
    Mary
    Monica
    Reva
    Sadie
    Stephanie
    Zelda

    (16 row(s) affected)

Prior to SQL Server 7, SQL Server always returned the result of a UNION in sorted order. This was so because the UNION eliminated duplicate rows using a sorting strategy. The ordering was simply a by-product of the sorting to eliminate duplicates. Newer versions of SQL Server, however, have several alternative strategies available for removing duplicates, so there is no guarantee of any particular order when you use UNION. If you would like to order the output, you should explicitly use ORDER BY at the end of your last SELECT statement.

Tip

The maximum number of rows possible when a UNION is used is the sum of the number of rows in the two result sets (or tables) in the two SELECT clauses.

Similar Columns in Unions

Earlier, we mentioned that for a union to be successful, there has to be union compatibility, and the two sets being unioned have to have similar columns. So what does similar columns mean?

If we wrote the earlier UNION example like this:

    SELECT major
    FROM Student
    WHERE major = 'COSC'
      UNION
    SELECT sname
    FROM Student
    WHERE major = 'MATH'

We would get a result set, but would the result set (output) be valid? The answer is no. You are trying to union majors and student names. These are not similar columns (though the data types of the two columns are compatible), and it does not make sense to union two different types of columns. So, before performing a union operation, you have to be very careful that you union like columns, and not “apples and oranges.”

Unioning Constants or Variables

In SQL Server 2005, a group of SELECT statements can also be used to union constants or variables:

    SELECT col1=100, col2=200
    UNION
    SELECT col1=400, col2=500
    UNION
    SELECT col1=100*3, col2=200*3
    UNION
    SELECT 900, 400

This query will produce:

    col1        col2
    ----------- -----------
    100         200
    300         600
    400         500
    900         400

    (4 row(s) affected)

Note that the output here happens to be sorted by the first column.

The UNION ALL Operation

UNION ALL works exactly like UNION, but does not expunge duplicates or sort the results. UNION ALL is more efficient in execution (because UNION ALL does not have to expunge the duplicates), and occasionally you may need to keep duplicates (just to keep all occurrences or records), in which case you can use UNION ALL.

The following is the same query previously shown for UNION, but using UNION ALL instead of UNION:

    SELECT sname
    FROM Student
    WHERE major = 'COSC'
      UNION ALL
    SELECT sname
    FROM Student
    WHERE major = 'MATH'

This query results in 17 unsorted rows, including one duplicate, Jake; using UNION produced 16 rows with no duplicates:

    sname
    --------------------
    Mary
    Zelda
    Brenda
    Lujack
    Elainie
    Jake
    Hillary
    Brad
    Alan
    Jerry
    Mario
    Kelly
    Reva
    Monica
    Sadie
    Stephanie
    Jake

    (17 row(s) affected)

Handling UNION and UNION ALL Situations with an Unequal Number of Columns

As has been mentioned earlier, in order to successfully UNION or UNION ALL result sets, the result sets being unioned have to have the same number of columns. That is, all queries in a UNION or UNION ALL operation must return the same number of columns. But what if all the queries being used in the UNION or UNION ALL do not return the same number of columns?

If we want to union two result sets that do not have the same number of columns, we have to use NULL (or other) values in the column-places as place holders. For example, from our Student_course database, if we want to union the Course table and the Prereq table with all the columns, under normal circumstances, this would not be possible, because the Course table has four columns and the Prereq table has only two. Therefore, to perform a UNION ALL operation, we would have to place NULL values or some other values in the columns that will be empty, as follows (this example uses NULL as a place holder):

    SELECT c.*, NULL
    FROM Course c
    WHERE c.credit_hours = 4
    UNION ALL
    SELECT NULL, p.course_number, NULL, NULL, p.prereq
    FROM Prereq p

This query produces the following 18 rows of output:

    COURSE_NAME          COURSE_NUMBER CREDIT_HOURS OFFERING_DEPT
    -------------------- ------------- ------------ ------------- --------
    INTRO TO COMPUTER SC COSC1310      4            COSC          NULL
    DATA STRUCTURES      COSC3320      4            COSC          NULL
    ADA - INTRODUCTION   COSC5234      4            COSC          NULL
    CALCULUS 1           MATH1501      4            MATH          NULL
    SOCIALISM AND COMMUN POLY4103      4            POLY          NULL
    POLITICS OF CUBA     POLY5501      4            POLY          NULL
    NULL                 ACCT3333      NULL         NULL          ACCT2220
    NULL                 CHEM3001      NULL         NULL          CHEM2001
    NULL                 COSC3320      NULL         NULL          COSC1310
    NULL                 COSC3380      NULL         NULL          COSC3320
    NULL                 COSC3380      NULL         NULL          MATH2410
    NULL                 COSC5234      NULL         NULL          COSC3320
    NULL                 ENGL1011      NULL         NULL          ENGL1010
    NULL                 ENGL3401      NULL         NULL          ENGL1011
    NULL                 ENGL3520      NULL         NULL          ENGL1011
    NULL                 MATH5501      NULL         NULL          MATH2333
    NULL                 POLY2103      NULL         NULL          POLY1201
    NULL                 POLY5501      NULL         NULL          POLY4103

    (18 row(s) affected)

We can also use other values (other than NULL) as placeholders, as shown here:

    SELECT c.*, COU_NUM = 'XXXXXXXXXXXX'
    FROM Course c
    WHERE c.credit_hours = 4
    UNION ALL
    SELECT 'XXXXXXXXXXXXX', p.course_number, 00000000000, 'XXXXXXXXXXXXX', p.prereq
    FROM Prereq p

This query gives the same output as the previous query, but this time we have used a series of Xs and 0s as placeholders instead of NULL (we have 18 rows of output):

    COURSE_NAME          COURSE_NUMBER CREDIT_HOURS OFFERING_DEPT COU_NUM
    -------------------- ------------- ------------ ------------- ------------
    INTRO TO COMPUTER SC COSC1310      4            COSC          XXXXXXXXXXXX
    DATA STRUCTURES      COSC3320      4            COSC          XXXXXXXXXXXX
    ADA - INTRODUCTION   COSC5234      4            COSC          XXXXXXXXXXXX
    CALCULUS 1           MATH1501      4            MATH          XXXXXXXXXXXX
    SOCIALISM AND COMMUN POLY4103      4            POLY          XXXXXXXXXXXX
    POLITICS OF CUBA     POLY5501      4            POLY          XXXXXXXXXXXX
    XXXXXXXXXXXXX        ACCT3333      0            XXXXXXXXXXXXX ACCT2220
    XXXXXXXXXXXXX        CHEM3001      0            XXXXXXXXXXXXX CHEM2001
    XXXXXXXXXXXXX        COSC3320      0            XXXXXXXXXXXXX COSC1310
    XXXXXXXXXXXXX        COSC3380      0            XXXXXXXXXXXXX COSC3320
    XXXXXXXXXXXXX        COSC3380      0            XXXXXXXXXXXXX MATH2410
    XXXXXXXXXXXXX        COSC5234      0            XXXXXXXXXXXXX COSC3320
    XXXXXXXXXXXXX        ENGL1011      0            XXXXXXXXXXXXX ENGL1010
    XXXXXXXXXXXXX        ENGL3401      0            XXXXXXXXXXXXX ENGL1011
    XXXXXXXXXXXXX        ENGL3520      0            XXXXXXXXXXXXX ENGL1011
    XXXXXXXXXXXXX        MATH5501      0            XXXXXXXXXXXXX MATH2333
    XXXXXXXXXXXXX        POLY2103      0            XXXXXXXXXXXXX POLY1201
    XXXXXXXXXXXXX        POLY5501      0            XXXXXXXXXXXXX POLY4103

    (18 row(s) affected)

NULL does not have a data type, so it can be used as a placeholder for both numeric and character columns. But when using other values as placeholders, the data types have to match. Hence we used 'XX...' (in the query with the single quotes) for the character columns, and 000s (in the query without quotes) for the numeric columns.

The IN and NOT..IN Predicates

Although SQL Server 2005 does not have the MINUS (difference) operator, it does have an IN predicate and its negation, the NOT..IN, which enables us to create differences. Let us look at this predicate from a set point of view. If we find the objects from set A that are not in set B, we have found the difference of set A and B (A - B).

Using IN

The following is a simple example of an IN predicate with constants in a SELECT statement:

    SELECT  sname, class
    FROM    Student
    WHERE   class IN (3,4)

In this example, IN (3,4) is called a subquery-set, where (3, 4) is the set in which we are testing membership. This query says: “Find all student names from the Student table where the class is in the set (3, 4).” It produces the following 17 rows of output:

    sname                class
    -------------------- ------
    Mary                 4
    Kelly                4
    Donald               4
    Chris                4
    Jake                 4
    Susan                3
    Monica               3
    Phoebe               3
    Holly                4
    Rachel               3
    Jerry                4
    Cramer               3
    Harrison             4
    Francis              4
    Losmith              3
    Gus                  3
    Benny                4

    (17 row(s) affected)

The preceding query produces the same output as the following query:

    SELECT  sname, class
    FROM    Student
    WHERE   class = 3 OR class = 4

In other words, the IN(3,4) means belonging to either set (3) OR set (4), as shown by the WHERE class = 3 OR class = 4.

Using IN as a subquery

We can expand the IN predicate’s subquery-set part to be an actual query. For example, consider the following query:

    SELECT Student.sname
    FROM   Student
    WHERE  Student.stno IN
      (SELECT  g.student_number
       FROM    Grade_report g
       WHERE   g.grade = 'A')

Tip

Subqueries will be discussed at length in the next chapter.

Note the following about this query:

WHERE Student.stno references the name of the column in the Student table.
g.student_number is the column name in the Grade_report table.
stno in the Student table and student_number in the Grade_report table have the same domain.

Note also that you must retrieve the information from the same domains for purposes of union compatibility.

The preceding query produces the following 14 rows of output:

    sname
    --------------------
    Lineas
    Mary
    Brenda
    Richard
    Lujack
    Donald
    Lynette
    Susan
    Holly
    Sadie
    Jessica
    Steve
    Cedric
    Jerry

    (14 row(s) affected)

You could view the preceding query as a result derived from the intersection of the sets A and B, where set A is the set of student numbers in the student set (from the Student table) and set B is the set of student numbers in the grade set (from the Grade_report table) that have As.

To make this command behave like a set operator (as if it were an INTERSECT operator), you can add the qualifier DISTINCT to the result set as follows:

    SELECT DISTINCT (Student.sname)
    FROM  Student
    WHERE Student.stno IN
      (SELECT DISTINCT (g.student_number)
       FROM Grade_report g
       WHERE g.grade = 'A')

This query produces the following 14 rows of output:

    sname
    --------------------
    Brenda
    Cedric
    Donald
    Holly
    Jerry
    Jessica
    Lineas
    Lujack
    Lynette
    Mary
    Richard
    Sadie
    Steve
    Susan

    (14 row(s) affected)

Here, SQL Server 2005 sorts the results for you and does not return duplicates.

The INTERSECT Operator

From a set point of view, an INTERSECT means if we find objects from set A that are also in set B (and vice versa), we have found the intersection of sets A and B. SQL Server 2005 has an INTERSECT operator.

The following query is the previous query written using an INTERSECT (but we displayed student numbers instead of student names):

    SELECT s.stno
    FROM Student s
    INTERSECT
    SELECT g.student_number
    FROM Grade_report g
    WHERE g.grade = 'A'

This query gives the following 14 rows of output:

    stno
    -------------
    2
    3
    8
    10
    14
    20
    34
    49
    123
    125
    126
    127
    129
    142

    (14 row(s) affected)

In this query, we had to display student numbers (stno) instead of the student names (sname) because of the set compatibility issue discussed earlier. INTERSECT is a set operator, so the two sets being intersected have to have the same number of columns and the columns have to have compatible data types.

Another example of the use of the INTERSECT operator would be, for example, if we wanted to find all the students who had dependents, in which case we could type:

    SELECT s.stno
    FROM Student s
    INTERSECT
    SELECT d.pno
    FROM Dependent d

This query would give the following 19 rows of output:

    stno
    ------------------
    2
    10
    14
    17
    20
    34
    62
    123
    126
    128
    132
    142
    143
    144
    145
    146
    147
    153
    158

    (19 row(s) affected)

Though the INTERSECT operator gives us the right answer, in some ways the IN as a subquery (discussed earlier) is better to use, because when SQL Server 2005 performs the INTERSECT, it selects sets based on what is mentioned in the SELECT statements. So, for example, if we wanted the student names in addition to the student numbers, and we typed:

    SELECT s.stno, s.sname
    FROM Student s
    INTERSECT
    SELECT d.pno, relationship
    FROM Dependent d

The query would not work.

Here we would have to use an IN with a subquery as discussed earlier:

    SELECT s.stno, s.sname
    FROM Student AS s
    WHERE (s.stno IN
    (SELECT pno
    FROM Dependent AS d))

giving us the following 19 rows of output:

    stno    sname
    ----    ------------------
    2       Lineas
    10      Richard
    14      Lujack
    17      Elainie
    20      Donald
    34      Lynette
    62      Monica
    123     Holly
    126     Jessica
    128     Brad
    132     George
    142     Jerry
    143     Cramer
    144     Fraiser
    145     Harrison
    146     Francis
    147     Smithly
    153     Genevieve
    158     Thornton

    (19 row(s) affected)

Using NOT..IN

The NOT..IN is really a negated IN predicate. If you use the NOT..IN in your query, your query may perform poorly. The reason is that when NOT..IN is used, no indexing can be used, because the NOT..IN part of the query has to test the set with all values to find out what is not in the set. For smaller tables, no difference in performance will likely be detected. Nonetheless, we discuss how to use NOT..IN in this section, to demonstrate the logical negative of the IN predicate, which will help to complete your overall understanding of the SQL language. Instead of using NOT..IN, it is often preferable to use NOT EXISTS or outer join techniques, both of which are discussed later on.

Tip

Indexing is discussed in detail in Chapter 11.

Sometimes the NOT..IN may seem to more easily describe the desired outcome or may be used for a set difference. For a simple example, consider the following query:

    SELECT sname, class
    FROM   Student
    WHERE  class IN (1,3,4)

This query produces the following 28 rows of output:

    sname                class
    -------------------- ------
    Lineas               1
    Mary                 4
    Richard              1
    Kelly                4
    Lujack               1
    Elainie              1
    Donald               4
    Chris                4
    Jake                 4
    Lynette              1
    Susan                3
    Monica               3
    Hillary              1
    Phoebe               3
    Holly                4
    Steve                1
    Brad                 1
    Rachel               3
    George               1
    Jerry                4
    Cramer               3
    Fraiser              1
    Harrison             4
    Francis              4
    Losmith              3
    Lindsay              1
    Gus                  3
    Benny                4

    (28 row(s) affected)

Contrast the preceding query to the following query:

    SELECT sname, class
    FROM   Student
    WHERE  class NOT IN (2)

The output in this case is the same as the preceding output because the Student table only has classes 1, 2, 3, and 4. If counts (results) did not “add up,” this would show that some value of class was not 1, 2, 3, or 4.

As another example, suppose that you want the names of students who are not computer science (COSC) or math (MATH) majors. The query would be:

    SELECT sname, major
    FROM   Student
    WHERE  major NOT IN ('COSC','MATH')

which produces the following output (28 rows):

    sname                major
    -------------------- -----
    Lineas               ENGL
    Ken                  POLY
    Romona               ENGL
    Richard              ENGL
    Harley               POLY
    Donald               ACCT
    Chris                ACCT
    Lynette              POLY
    Susan                ENGL
    Bill                 POLY
    Phoebe               ENGL
    Holly                POLY
    Jessica              POLY
    Steve                ENGL
    Cedric               ENGL
    Rachel               ENGL
    George               POLY
    Cramer               ENGL
    Fraiser              POLY
    Harrison             ACCT
    Francis              ACCT
    Smithly              ENGL
    Sebastian            ACCT
    Losmith              CHEM
    Genevieve            UNKN
    Lindsay              UNKN
    Gus                  ART
    Benny                CHEM

    (28 row(s) affected)

The example output gave all majors other than COSC and MATH. But you must be very careful with the NOT..IN predicate, because if nulls are present in the data, you may get odd answers with NOT..IN.

As an example, consider the following table called Stumajor:

    name                 major
    -------------------- --------------------
    Mary                 Biology
    Sam                  Chemistry
    Alice                Art
    Tom                  NULL

    (4 row(s) affected)

Tip

The table Stumajor has not been created for you in the Student_course database. You have to create it, insert the records shown, and then run the queries that follow.

If you perform the following query:

    SELECT *
    FROM Stumajor
    WHERE major IN ('Chemistry','Biology')

It produces the following output:

    name                 major
    -------------------- --------------------
    Mary                 Biology
    Sam                  Chemistry

    (2 row(s) affected)

If you perform the following query:

    SELECT *
    FROM Stumajor
    WHERE major NOT IN ('Chemistry','Biology')

It produces the following output:

    name                 major
    -------------------- --------------------
    Alice                Art

    (1 row(s) affected)

The value, null, is not equal to anything. You might expect that NOT..IN would give you <Tom,null>, but it does not. Why? Because nulls in the selection column (here, major) are not matched with a NOT..IN.

Using NOT..IN in a subquery

A NOT..IN can also be used in a subquery. For example, assume that we have another table called Instructor, as shown here:

    iname                teaches
    -------------------- --------------------
    Richard              COSC
    Subhash              MATH
    Tapan                BIOCHEM

    (3 row(s) affected)

Tip

The Instructor table has not been created for you in the Student_course database. You have to create it, insert the records shown, and then run the queries that follow.

Now, if we want to find all the departments that do not have instructors, we could type the following query:

    SELECT *
    FROM department_to_major d
    WHERE d.dcode NOT IN
      (SELECT dcode
      FROM department_to_major d, instructor i
      WHERE d.dcode=i.teaches)

This query produces the following output (6 rows):

    Dcode DNAME
    ----- --------------------
    ACCT  Accounting
    ART   Art
    CHEM  Chemistry
    ENGL  English
    POLY  Political Science
    UNKN  NULL

    (6 row(s) affected)

Note that in this case, the NOT..IN “behaved” correctly and reported the NULL value for DNAME!

The Difference Operation

Because SQL Server 2005 does not support the MINUS predicate, we will show the set difference operation using a NOT..IN with two examples.

Example 1

Suppose that set A is the set of students in classes 2, 3, or 4 and set B is the set of students in class 2. We could use the NOT..IN predicate to remove the students in set B from set A (a difference operation) by typing the following query:

    SELECT sname, class
    FROM   Student
    WHERE class IN (2,3,4)
      AND NOT class IN (2)

which produces the following output (17 rows):

    sname                class
    -------------------- ------
    Mary                 4
    Kelly                4
    Donald               4
    Chris                4
    Jake                 4
    Susan                3
    Monica               3
    Phoebe               3
    Holly                4
    Rachel               3
    Jerry                4
    Cramer               3
    Harrison             4
    Francis              4
    Losmith              3
    Gus                  3
    Benny                4

    (17 row(s) affected)

Example 2

To illustrate another difference operation, we will use views with the NOT..IN to give the effect of a difference operation. Suppose for example, you wanted to find the names of those students who do not major in COSC or MATH but delete from that set those students who have made an A in some course.

First, using the NOT..IN, we will create a view (view1) of the names and majors of the students who are not COSC or MATH majors using the following query:

    CREATE VIEW view1 AS
    SELECT sname, major
    FROM   Student
    WHERE  major NOT IN ('COSC', 'MATH')

View1 will have the same 28 rows of output as shown earlier in this chapter.

Then, using the IN predicate, we will create another view (view2) of names and majors of students who have received As using the following query:

    CREATE VIEW view2 AS
    SELECT Student.sname, Student.major
    FROM   Student
    WHERE  Student.stno IN
      (SELECT g.student_number
       FROM   Grade_report g
       WHERE  g.grade = 'A')

If we type:

    SELECT *
    FROM view2;

We get the following 14 rows of output:

    sname                major
    -------------------- -----
    Lineas               ENGL
    Mary                 COSC
    Brenda               COSC
    Richard              ENGL
    Lujack               COSC
    Donald               ACCT
    Lynette              POLY
    Susan                ENGL
    Holly                POLY
    Sadie                MATH
    Jessica              POLY
    Steve                ENGL
    Cedric               ENGL
    Jerry                COSC

    (14 row(s) affected)

Then, to find those students who are not majoring in COSC or MATH, and remove from that set those who made an A in some course, the difference operation could be approached using the NOT..IN as follows, using the views created earlier:

    SELECT sname
    FROM view1
    WHERE sname NOT IN
      (SELECT sname
      FROM view2)

This query produces the following output (19 rows):

    sname
    --------------------
    Ken
    Romona
    Harley
    Chris
    Bill
    Phoebe
    Rachel
    George
    Cramer
    Fraiser
    Harrison
    Francis
    Smithly
    Sebastian
    Losmith
    Genevieve
    Lindsay
    Gus
    Benny

    (19 row(s) affected)

This query has the same effect as view1--view2 (all students who are not majoring in COSC or MATH, MINUS students who made an A in some course).

The Union and the Join

In Chapter 4, we discussed joins. In this section, we discuss some differences between the two operations, the UNION and the JOIN. Although the UNION operation and the JOIN operation are similar in that they both combine two tables or sets of data, the approaches used by the two operations are different. We will first present an example of when a JOIN may be used versus when a UNION may be used, and then we will present some other differences between the UNION and the JOIN.

When a JOIN May Be Used Versus When a UNION May Be Used

A JOIN is very commonly used in queries. As we discussed previously (in Chapter 4), JOINs (specifically, equi-joins) involve a result set created based on tables where the tables are linked via some common column. The UNION operator is mostly used to combine two sets of information where the genesis of the information is not as straightforward as in a join. Consider the following two examples.

Example 1: A straightforward join operation

Suppose that we wanted to find the names of students who took accounting courses. This is a straightforward join example. This type of query would involve joining the Student, Section, and Course tables and selecting the student names from the result set. In this case though, we actually have to join the Student table to the Grade_report table first, and then join that result to the Section table, because we cannot directly join the Student table to the Section table. Then, we join that combined result to the Course table—so this ends up becoming a four-table join, with the Grade_report table acting like a bridge between Student and Section. The JOIN query would be:

    SELECT DISTINCT(sname)
    FROM Course c JOIN (Section se JOIN
    (Student s JOIN Grade_report g
    ON s.stno = g.student_number)
    ON se.section_id = g.section_id)
    ON c.course_number = se.course_num
    AND c.course_name LIKE 'ACC%'

This query would give the following 20 rows of output:

    sname
    ---------------
    Alan
    Bill
    Brad
    Brenda
    Cedric
    Chris
    Donald
    Hillary
    Holly
    Jessica
    Kelly
    Ken
    Mario
    Monica
    Phoebe
    Romona
    Sadie
    Steve
    Susan
    Zelda

    (20 row(s) affected)

Note that we had to use a DISTINCT in the previous query, as the result of a JOIN gives duplicates.

This example query could also be answered using subqueries, which are discussed later, but the point is that it is easy to see the relationship between the three (actually four) tables.

Example 2: A not-so-straightforward query

Suppose that we wanted to find something like the names of the students who take accounting courses and combine them with the names of students who also major in subjects that use overhead projectors in the courses they take. This could be done using a join with this database, but it would involve finding a join-path through most of the database. For a much larger database, it might be very impractical to consider such a large join. It would be easier to first find the set of names of students who take accounting courses (call this set A) and then find students who major in subjects that use projectors (set B), then union sets A and B. The UNION approach allows us to simplify the problem and check intermediate results, so we will present this problem using a UNION. Further, each part of the problem can be done with joins or subqueries as needed for efficiency and then the results finally unioned. Set operations allow us to create sets of results any way we can and then combine the result sets using set operations; UNION is a set operation.

Following, we present the UNION approach to doing this query. The first step is to do the parts individually. That is, first find the set of names of students who take accounting courses (this is the first half of the query before the UNION). Once this is done, then do the second part individually; that is, find the students who major in subjects that use projectors. Once you have the result for both parts, UNION the two results. We will not need the DISTINCT here, as UNION does not keep the duplicates. Here is a query that shows this approach:

    SELECT sname
    FROM Course c JOIN (Section se JOIN
    (Student s JOIN Grade_report g
    ON s.stno = g.student_number)
    ON se.section_id = g.section_id)
    ON c.course_number = se.course_num
    AND c.course_name LIKE 'ACC%'
    UNION
    SELECT sname
    FROM Student s JOIN
    (Department_to_major d
    JOIN (Course c JOIN
    (Room r JOIN Section se
    ON r.room = se.room)
    ON se.course_num = c.course_number)
    ON c.offering_dept = d.dcode)
    ON s.major = d.dcode
    AND r.ohead = 'Y'

This query produces 30 rows:

    sname
    --------------------
    Alan
    Bill
    Brad
    Brenda
    Cedric
    Chris
    Cramer
    Donald
    Elainie
    Hillary
    Holly
    Jake
    Jerry
    Jessica
    Kelly
    Ken
    Lineas
    Lujack
    Mario
    Mary
    Monica
    Phoebe
    Rachel
    Richard
    Romona
    Sadie
    Smithly
    Steve
    Susan
    Zelda

    (30 row(s) affected)

A Summary of the Other Differences Between the UNION and the JOIN

In this section, we summarize our JOIN/UNION discussion with three abstract tables containing three rows each of symbolic data. Relations or tables are sets of rows.

We will first show the union. Assume that we have the following two tables.

Table A

ColumnA	ColumnB	ColumnC
X1	Y1	Z1
X2	Y2	Z2
X3	Y3	Z3

Table B

ColumnA	ColumnB	ColumnC
X4	Y4	Z4
X5	Y5	Z5
X6	Y6	Z6

A SQL UNION can be shown would be:

    SELECT * FROM TableA
    UNION
    SELECT * FROM TableB

which produces the following table as a result:

Table C

ColumnA	ColumnB	ColumnC
X1	Y1	Z1
X2	Y2	Z2
X3	Y3	Z3
X4	Y4	Z4
X5	Y5	Z5
X6	Y6	Z6

Using a similar set of diagrams, the join operation could be shown as follows with the following two tables (joining TableA and TableD into TableE):

Table A

ColumnA	ColumnB	ColumnC
X1	Y1	Z1
X2	Y2	Z2
X3	Y3	Z3

Table D

ColumnA	ColumnD	ColumnE
X1	D1	E1
X2	D2	E2
X3	D3	E3

Now, a SQL JOIN would be:

    SELECT *
    FROM TableA a JOIN TableD d
    ON a.ColumnA = d.ColumnA

Giving the following table:

Table E

TableA.ColumnA	TableA.ColumnB	TableA.ColumnC	TableB.ColumnA	TableB.ColumnD	TableB.ColumnE
X1	Y1	Z1	X1	D1	E1
X2	Y2	Z2	X2	D2	E2
X3	Y3	Z3	X3	D3	E3

Following are the major differences between UNIONs and JOINs:

In a UNION, all the rows in the resulting tables (sets) being unioned have to be compatible; in a JOIN, only the joining columns of the tables being joined have to be compatible—the other columns may be different.
In a UNION, no “new” columns can be added to the final result of the UNION; in a JOIN, new columns can be added to the result of the JOIN.
In a UNION, the number of columns in the result set has to be the same as the number of columns in the sets being unioned; in a JOIN, the number of columns in the result set may vary.

A UNION Used to Implement a Full Outer Join

In Chapter 4, you read that the outer join adds rows to the result set that would otherwise be dropped from an inner join of both tables due to the join condition. Remember that an inner join (also known as an equi-join, ordinary join or regular join) combines two tables by finding common values on some column(s) common to the two tables. In an outer join, we are saying, “we want all the rows from one table and only the joined rows from the other.” In SQL Server 2005, the outer joins are in two classes—left and right, depending on how the query is written. A full outer join means that we want all rows from both tables being joined, and “fill in those rows where a join does not produce a result with nulls.” In SQL Server 2005, a UNION can also be used to achieve this full outer join.

Tip

Some SQL languages do not directly support the full outer join, but SQL Server 2005 directly supports it.

In SQL Server 2005, you can create a full outer join by writing a union of the left outer join and the right outer join, like this:

    SELECT with right outer join
    UNION
    SELECT with left outer join

The order of the left outer join and the right outer join does not matter and can be reversed. To illustrate the workings of the UNION version of the full outer join, let us again use the table called Instructor, created earlier in this chapter:

    iname                teaches
    -------------------- --------------------
    Richard              COSC
    Subhash              MATH
    Tapan                BIOCHEM

If we want to get a listing of all instructors and the names of the departments for which they teach (which will be done by a regular equi-join) plus a listing of the rest of the instructors, regardless of whether they belong to a department, plus a listing of the rest of the departments, regardless of whether they have instructors, we would write the following query to achieve the full outer join effect with a UNION:

    SELECT *
    FROM Department_to_major AS d LEFT JOIN Instructor AS I
    ON d.dcode=i.teaches
        UNION
    SELECT *
    FROM Department_to_major AS d RIGHT JOIN Instructor AS I
    ON d.dcode=i.teaches

This query produces the following output (9 rows):

    Dcode DNAME                iname                teaches
    ----- -------------------- -------------------- --------------------
    NULL  NULL                 Tapan                BIOCHEM
    ACCT  Accounting           NULL                 NULL
    ART   Art                  NULL                 NULL
    CHEM  Chemistry            NULL                 NULL
    COSC  Computer Science     Richard              COSC
    ENGL  English              NULL                 NULL
    MATH  Mathematics          Subhash              MATH
    POLY  Political Science    NULL                 NULL
    UNKN                       NULL                 NULL

    (9 row(s) affected)

First, the LEFT JOIN was done, outer joining the department_to_major table and the Instructor table (so that all the rows of the department_to_major table were added to the result set). Then, a RIGHT JOIN was done, again joining the department_to_major table to the Instructor table (but this time all the rows of the Instructor table were added to the result set). Finally, a UNION of the two results sets was performed, creating the effect of a full outer join (where the rows from both the tables were added back after the join).

Summary

In this chapter, we discussed the set operators available in SQL Server 2005. After reading this chapter, you should have an appreciation of how and when to use UNIONs and INTERSECTs, and how to handle the difference problem, although SQL Server 2005 does not have an explicit MINUS operator. Oftentimes queries can be approached in more than one way. In several places, we also showed how the same queries could also be approached without the use of set operators.

Review Questions

What are the major differences between the UNION operation and the JOIN operation?
What is the major difference between the UNION and the UNION ALL?
What major set operator does SQL Server 2005 not have? How can these problems be resolved?
What does union compatibility mean?
What data types are union-compatible?
What is the maximum number of rows that can result from a UNION of two tables—one with 5 rows and the other with 6 rows?
What is the maximum number of rows that can result from a JOIN of two tables—one with 5 rows and the other with 6 rows?
How can a UNION be used to implement an outer join? Explain.
Does SQL Server 2005 support the MINUS operation? How can this be resolved? Give examples.
What is a full outer join? Does SQL Server 2005 directly support a full outer join?
Do you need the same number of columns to perform a union?
Do you need the same data types to perform a union?
Do you need the same number of columns to perform a join?
From the examples given in the chapter, what does the UNION JOIN appear to do?
If a VARCHAR column were unioned with a CHAR column, what would the resulting column be? (Hint: refer to the "Data Type Precedence" section in Chapter 6.)
What does set compatibility mean?
What is the maximum number of rows that can result from a INTERSECT of two tables—one with 5 rows and the other with 6 rows?
Do you need the same number of columns to perform an INTERSECT operation?
Do you need the same data types to perform an INTERSECT operation?

Exercises

Unless specified otherwise, use the Student_course database to answer the following questions. Also, use appropriate column headings when displaying your output.

In this exercise, you’ll test the UNION statement. Having seen how the UNION statement works, demonstrate some permutations to see what will work “legally” and what won’t. First, create two tables as follows:

Table 1
A	B
x1	y1
r1	s1

Table 2
A	B	C	D
x2	y2	z2	w2
r2	s2	t2	u2

Make the type of As and Bs CHAR(2). Let the type of C in Table2 be VARCHAR(2) and D in Table2 be VARCHAR(3).

Try the following statements and note the results:

    SELECT * FROM Table1 UNION SELECT * FROM Table2
    SELECT * FROM Table1 UNION SELECT A,B FROM Table2
    SELECT * FROM Table1 UNION SELECT B,A FROM Table1
    SELECT * FROM Table1 UNION SELECT A,C FROM Table2
    SELECT * FROM Table1 UNION SELECT A,D FROM Table2
    CREATE VIEW viewx AS
    SELECT A,B
    FROM Table2
    SELECT *
    FROM Table1
      UNION
    SELECT *
    FROM viewx

Feel free to experiment with any other combinations that you deem appropriate or that you wonder about.

Create and print the result of a query that generates the names, class, and course numbers of students who have earned Bs in computer science courses. Store this query as Q7_2. Then, revise Q7_2 to delete from the result set those students who are sophomores (class = 2). Use NOT..IN to select those students who are sophomores.
Find the names, grades, and course numbers of students who have earned As in computer science or math courses. Join the Section and Grade_report tables (be careful to not create the Cartesian product). Then, UNION the set of “course numbers COSC% and A” with the set of “course number MATH% and A.”
Hint: Start with the query to get names, grades, and course numbers for COSC% and A, and then turn this into a view. Do the same for MATH% and A, and then execute the UNION statement as follows (using your view names):
```
    SELECT *
    FROM view1a
      UNION
    SELECT *
    FROM view1b
```
Find the names and majors of students who have made a C in any course. Make the “who have made a C in any course” a subquery for which you use IN.
A less-obvious example of a difference query is to find a difference that is not based on simple, easy-to-get sets. Suppose that set A is the set of student names who have made As and Bs in computer science (COSC) courses. Suppose further that set B is the set of students who have taken math courses (regardless of what grade they earned).
Then, set A minus set B would contain names of students who have made As or Bs in computer science courses, less those who have taken math courses. Similarly, set B minus set A would be the set of students who took math courses, less those who took COSC courses and made an A or a B in some COSC course.
Build these queries into set difference queries as views based on student numbers and execute them, as follows:
1. Write a query that gives the student number, name, course, and grade for each set. Save each query as Q7_5a and Q7_5b.
2. Reconstruct each query into a view of just student numbers, verify that it works, and then create views to create set A and set B. Verify that you have the same number of tuples in set A as you have in Q7_5a, and the same number of tuples in set B as you have in Q7_5b.
3. Display the student numbers of students in each set difference—show (set A minus set B) and (set B minus set A). Look at the original queries, Q7_5a and Q7_5b, to verify your result.
Create two tables, T1 and T2, that contain a name and a salary column. In the first table, order the columns by name, and then by salary. In the second table, order the columns by salary, and then by name. Use the same data types for each - VARCHAR(20), NUMBER, for example. Populate the tables with two tuples each.
Can you UNION the two tables in the preceding question with the following query?
```
    SELECT *
    FROM T1
      UNION
    SELECT *
    FROM T2
```
Why or why not? If not, can you force the union of the two tables? Illustrate how. Be sure to DROP the tables when you are finished.
Using the Instructor table you created in this chapter (as well as the tables supplied in the Student_course database), find the following (use the UNION or INTERSECT operator if you feel it is appropriate):
1. All departments that have instructors. First do this using an IN predicate, and then using a regular join.
2. Find all students who are also instructors.
3. Find all instructors who are not students.
4. Find all students who are not instructors.
5. Find all students as well as instructors.
Using the Student table, find all the students who major in math and are seniors. Hint: Use the INTERSECT operator for this.

Optional Exercise

De Morgan's Theorem.In the binary case, DeMorgan’s Theorem tells us that [not(A and B)] = [not(A) or not(B)]. For example, suppose that A is the set of rows where students are juniors and B is the set of rows where students are females. And suppose that you were asked the question, “Find the students who are not (female and juniors).” Clearly this is the set [not(A and B)]. You can answer this question by finding the set of students who are not juniors [not(A)] and then or-ing this with the set of students who are not females [not(B)]. At times it is easier to find one or the other of the results via a query, and the point here is that the two methods of finding a result is equivalent.
Question: Find the result set for all sections that are offered in building 13 and call this set A. Find the result set for all sections that are offered in building 36 and call this set B. Construct the SQL to find the following result sets:
1. The result of set A OR set B (use WHERE building = 13 or building = 36).
2. The result of the complement of (a): NOT(set A OR set B).
3. The result of NOT(set A) AND NOT(set B).
4. The count of all rows in the Section table.
Is the count in d = a + b? Is the result of c the same as the result of b? Explain why or why not in each case (Hint: You may apply the De Morgan’s Theorem which states that NOT(set A or set B) = NOT(set A) and NOT(set b).

Table of Contents for
7. Set Operations

Chapter 7. Set Operations

Introducing Set Operations

Union Compatibility

Tip

Tip

The UNION Operation

Tip

Tip

Similar Columns in Unions

Unioning Constants or Variables

The UNION ALL Operation

Handling UNION and UNION ALL Situations with an Unequal Number of Columns

The IN and NOT..IN Predicates

Using IN

Using IN as a subquery

Tip

The INTERSECT Operator

Using NOT..IN

Tip

Tip

Using NOT..IN in a subquery

Tip

The Difference Operation

Example 1

Example 2

The Union and the Join

When a JOIN May Be Used Versus When a UNION May Be Used

Example 1: A straightforward join operation

Example 2: A not-so-straightforward query

A Summary of the Other Differences Between the UNION and the JOIN

A UNION Used to Implement a Full Outer Join

Tip

Summary

Review Questions

Exercises

Optional Exercise

Table of Contents for 7. Set Operations

Create new playlist

Sign In

Sign Up

Chapter 7. Set Operations

Introducing Set Operations

Union Compatibility

Tip

Tip

The UNION Operation

Tip

Tip

Similar Columns in Unions

Unioning Constants or Variables

The UNION ALL Operation

Handling UNION and UNION ALL Situations with an Unequal Number of Columns

The IN and NOT..IN Predicates

Using IN

Using IN as a subquery

Tip

The INTERSECT Operator

Using NOT..IN

Tip

Tip

Using NOT..IN in a subquery

Tip

The Difference Operation

Example 1

Example 2

The Union and the Join

When a JOIN May Be Used Versus When a UNION May Be Used

Example 1: A straightforward join operation

Example 2: A not-so-straightforward query

A Summary of the Other Differences Between the UNION and the JOIN

A UNION Used to Implement a Full Outer Join

Tip

Summary

Review Questions

Exercises

Optional Exercise

Table of Contents for
7. Set Operations