Chapter 4. Combining Tables Vertically Using PROC SQL

Overview

Introduction

Suppose you are generating a report based on data from a health clinic. You want to display the results of individual patient stress tests taken in 1998 (which are stored in table A), followed by the results from stress tests taken in 1999 (which are stored in table B). Instead of combining the table rows horizontally, as you would in a PROC SQL join, you want to combine the table rows vertically (one on top of the other).

Introduction

When you need to select data from multiple tables and combine the tables vertically, PROC SQL can be an efficient alternative to using other SAS procedures or the DATA step. In a PROC SQL set operation, you use one of four set operators (EXCEPT, INTERSECT, UNION, and OUTER UNION) to combine tables (and views) vertically by combining the results of two queries:

proc sql;
      select *
         from a
      set-operator 
      select *
         from b;

Each set operator combines the query results in a different way.

In this chapter, you will learn how to use the various set operators, with or without the optional keywords ALL and CORR (CORRESPONDING), to combine the results of multiple queries.

Note

In this chapter, the references to tables are also applicable to views, unless otherwise noted. Δ

Objectives

In this chapter, you learn to

  • combine the results of multiple PROC SQL queries in different ways by using the set operators EXCEPT, INTERSECT, UNION, and OUTER UNION

  • modify the results of a PROC SQL set operation by using the keywords ALL and CORR (CORRESPONDING)

  • compare PROC SQL outer unions with other SAS techniques.

Prerequisites

Before beginning this chapter, you should complete the following chapters:

  • Chapter 1, "Performing Queries Using PROC SQL," on page 3

  • Chapter 2, "Performing Advanced Queries Using PROC SQL," on page 25

  • Chapter 3, "Combining Tables Horizontally Using PROC SQL," on page 79.

Understanding Set Operations

A set operation is a SELECT statement that contains

  • two groups of query clauses (each group beginning with a SELECT clause)

  • a set operator

  • one or both of the keywords ALL and CORR (CORRESPONDING).

Example

In the following PROC SQL step, the SELECT statement contains one set operation. The set operation uses the set operator UNION to combine the result of a query on the table Sasuser.Stress98 with the result of a query on the table Sasuser.Stress99.

   proc sql;
      select  * 
         from sasuser.stress98
      union 
      select  * 
         from sasuser.stress99;

You will learn the details about using each set operator later in this chapter.

Processing a Single Set Operation

PROC SQL evaluates a SELECT statement with one set operation as follows:

  1. Each query is evaluated to produce an intermediate (internal) result table.

  2. Each intermediate result table then becomes an operand linked with a set operator to form an expression (for example, Table1 UNION Table2).

  3. PROC SQL evaluates the entire expression to produce a single output result set.

Using Multiple Set Operators

A single SELECT statement can contain more than one set operation. Each additional set operation includes a set operator and a group of query clauses, as shown in the following example:

   proc sql;
      select * 
         from table1
      set-operator 
      select * 
         from table2
      set-operator 
      select * 
         from table3;

This SELECT statement uses two set operators to link together three queries. Regardless of the number of set operations in a SELECT statement, the statement contains only one semicolon, which is placed after the last group of query clauses.

Example

The following PROC SQL step contains two set operators (both are OUTER UNION) that combine three queries:

   proc sql;
      select * 
         from sasuser.mechanicslevel1
      outer union 
      select
         from sasuser.mechanicslevel2
      outer union 
      select * 
         from sasuser.mechanicslevel3;

Processing Multiple Set Operations

When PROC SQL evaluates a SELECT statement that contains multiple set operations, an additional processing step (step 3 below) is required:

  1. Each query is evaluated to produce an intermediate (internal) result table.

  2. Each intermediate result table then becomes an operand linked with a set operator to form an expression (for example, Table1 UNION Table2).

  3. If the set operation contains more than two queries, then the result from the first two queries (enclosed in parentheses in the following examples) becomes an operand for the next set operator and operand. For example:

    • with two set operators: (Table1 UNION Table2) EXCEPT Table3

    • with three set operators: ((Table1 UNION Table2) EXCEPT Table3) INTERSECT Table4.

  4. PROC SQL evaluates the entire expression to produce a single output result set.

Note

When processing set operators, PROC SQL follows a default order of precedence, unless this order is overridden by parentheses in the expression(s). By default, INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT all have the same level of precedence. Δ

Introducing Set Operators

Each of the four set operators EXCEPT, INTERSECT, UNION, and OUTER UNION selects rows and handles columns in a different way, as described below.

Note

In the following chart, Table 1 is the table that is referenced in the first query and Table 2 is the table that is referenced in the second query. Δ

Set Operator

Treatment of Rows

Treatment of Columns

Example

EXCEPT

Selects unique rows from the first table that are not found in the second table.

Introducing Set Operators

Overlays columns based on their position in the SELECT clause without regard to the individual column names.

proc sql;
   select *
      from table1
   except 
   select *
      from table2;

INTERSECT

Selects unique rows that are common to both tables.

Introducing Set Operators

Overlays columns based on their position in the SELECT clause without regard to the individual column names.

proc sql;
   select *
      from table1
   intersect 
   select *
      from table2;

UNION

Selects unique rows from one or both tables.

Introducing Set Operators

Overlays columns based on their position in the SELECT clause without regard to the individual column names.

proc sql;
   select *
      from table1
   union 
   select *
      from table2;

OUTER UNION

Selects all rows from both tables.

Introducing Set Operators

The OUTER UNION operator concatenates the results of the queries.

Does not overlay columns.

proc sql;
   select *
      from table1
   outer union 
   select *
      from table2;

Note

A set operator that selects only unique rows will display one occurrence of a given row in output. Δ

Processing Unique versus Duplicate Rows

When processing a set operation that displays only unique rows (a set operation that contains the set operator EXCEPT, INTERSECT, or UNION), PROC SQL makes two passes through the data, by default:

  1. PROC SQL eliminates duplicate (nonunique) rows in the tables.

  2. PROC SQL selects the rows that meet the criteria and, where requested, overlays columns.

For set operations that display both unique and duplicate rows, only one pass through the data (step 2 above) is required.

Combining and Overlaying Columns

You can use a set operation to combine tables that have different numbers of columns and rows or that have columns in a different order.

Three of the four set operators (EXCEPT, INTERSECT, and UNION) combine columns by overlaying them. (The set operator OUTER UNION does not overlay columns.)

By default, the set operators EXCEPT, INTERSECT, and UNION overlay columns based on the relative position of the columns in the SELECT clause. Column names are ignored. You control how PROC SQL maps columns in one table to columns in another table by specifying the columns in the appropriate order in the SELECT clause. The first column specified in the first query's SELECT clause and the first column specified in the second query's SELECT clause are overlaid, and so on.

When columns are overlaid, PROC SQL uses the column name from the first table (the table referenced in the first query). If there is no column name in the first table, the column name from the second table is used. When the SELECT clause contains an asterisk (*) instead of a list of column names, the set operation combines the tables (and, if applicable, overlays columns) based on the positions of the columns in the tables.

For example, the following set operation uses the set operator EXCEPT, so columns are overlaid. The SELECT clause in each query uses an asterisk (*), so the columns are overlaid based on their positions in the tables. The first column in table One (X)is overlaid on the first column in table Two (X), and so on.

Combining and Overlaying Columns

In order to be overlaid, columns in the same relative position in the two SELECT clauses must have the same data type. If they do not, PROC SQL generates a warning message in the SAS log and stops executing. For example, in the tables shown above, if the column One.X had a different data type than column Two.X, the SAS log would display the following error message.

Table 4.1. SAS Log

ERROR: Column 1 from the first contributor of EXCEPT is not the same type as its counterpart from the second.

Next, we will use the keywords ALL and CORR to modify the default action of the set operators.

Modifying Results by Using Keywords

To modify the behavior of set operators, you can use either or both of the keywords ALL and CORR immediately following the set operator:

proc sql;
      select *
         from table
      set-operator <all> <corr> 
      select *
         from table2;

The use of each keyword is described below.

Keyword

Action

Used When…

ALL

Makes only one pass through the data and does not remove duplicate rows.

You do not care if there are duplicates.

Duplicates are not possible.

ALL cannot be used with OUTER UNION.

CORR (or CORRESPONDING)

Compares and overlays columns by name instead of by position:

  • When used with EXCEPT, INTERSECT, and UNION, removes any columns that do not have the same name in both tables.

  • When used with OUTER UNION, overlays same-named columns and displays columns that have nonmatching names without overlaying.

If an alias is assigned to a column in the SELECT clause, CORR will use the alias instead of the permanent column name.

Two tables have some or all columns in common, but the columns are not in the same order.

In the remainder of this chapter, you will learn more about the use of each set operator, with and without the keywords ALL and CORR.

Using the EXCEPT Set Operator

The set operator EXCEPT does both of the following:

  • selects unique rows from the first table (the table specified in the first query) that are not found in the second table (the table specified in the second query)

  • overlays columns.

    Using the EXCEPT Set Operator

Consider how EXCEPT works when used alone and with the keywords ALL and CORR.

Using the EXCEPT Operator Alone

Suppose you want to display the unique rows in table One that are not found in table Two. The PROC SQL set operation that includes the EXCEPT operator, the tables One and Two, and the output of the set operation are shown below:

Using the EXCEPT Operator Alone

The set operator EXCEPT overlays columns by their position. In this output, the following columns are overlaid:

  • the first columns, One.X and Two.X, both of which are numeric

  • the second columns, One.A and Two.B, both of which are character.

The column names from table One are used, so the second column of output is named A rather than B.

Consider how PROC SQL selects rows from table One to display in output.

In the first pass, PROC SQL eliminates any duplicate rows from the tables. As shown below, there is one duplicate row: in table One, the second row is a duplicate of the first row. All remaining rows in table One are still candidates in PROC SQL's selection process.

Using the EXCEPT Operator Alone

In the second pass, PROC SQL identifies any rows in table One for which there is a matching row in table Two and eliminates them. There is one matching row in the two tables, as shown below, which is eliminated.

Using the EXCEPT Operator Alone

The five remaining rows in table One, the unique rows, are displayed in the output.

Using the EXCEPT Operator Alone

Using the Keyword ALL with the EXCEPT Operator

To select all rows in the first table (both unique and duplicate) that do not have a matching row in the second table, add the keyword ALL after the EXCEPT set operator. The modified PROC SQL set operation, the tables One and Two, and the output are shown below:

Using the Keyword ALL with the EXCEPT Operator

The output now contains six rows. PROC SQL has again eliminated the one row in table One (the fifth row) that has a matching row in table Two (the fourth row). Remember that when the keyword ALL is used with the EXCEPT operator, PROC SQL does not make an extra pass through the data to remove duplicate rows within table One. Therefore, the second row in table One, which is a duplicate of the first row, is now included in the output.

Using the Keyword CORR with the EXCEPT Operator

To display both of the following, add the keyword CORR after the set operator.

  • only columns that have the same name

  • all unique rows in the first table that do not appear in the second table.

The modified PROC SQL set operation, the tables One and Two, and the output are shown below:

Using the Keyword CORR with the EXCEPT Operator

X is the only column that has the same name in both tables, so X is the only column that PROC SQL examines and displays in the output.

In the first pass, PROC SQL eliminates the second and third rows of table One from the output because they are not unique within the table; they contain values of X that duplicate the value of X in the first row of table One. In the second pass, PROC SQL eliminates the first, fourth, and fifth rows of table One because each contains a value of X that matches a value of X in a row of table Two. The output displays the two remaining rows in table One, the rows that are unique in table One and that do not have a row in table Two that has a matching value of X.

Using the Keywords ALL and CORR with the EXCEPT Operator

If the keywords ALL and CORR are used together, the EXCEPT operator will display all unique and duplicate rows in the first table that do not appear in the second table, and will overlay and display only columns that have the same name. The modified PROC SQL set operation, the tables One and Two, and the output are shown below:

Using the Keywords ALL and CORR with the EXCEPT Operator

Once again, PROC SQL looks at and displays only the column that has the same name in the two tables: X. Because the ALL keyword is used, PROC SQL does not eliminate any duplicate rows in table One. Therefore, the second and third rows in table One, which are duplicates of the first row in table One, appear in the output. PROC SQL does eliminate the first, fourth, and fifth rows in table One from the output because for each one of these three rows there is a corresponding row in table Two that has a matching value of X.

As this example shows, when the ALL keyword is used with the EXCEPT operator, a row in table One cannot be eliminated from the output unless it has a separate matching row in table Two. Table One contains three rows in which the value of X is 1, but table Two contains only one row in which the value of X is 1. That one row in table Two causes the first of the three rows in table One that have a matching value of X to be eliminated from the output. However, table Two does not have two additional rows in which the value of X is 1, so the other two rows in table One are not eliminated, and do appear in the output.

Example: EXCEPT Operator

The EXCEPT operator can be used to solve a realistic business problem. Suppose you want to display the names of all new employees of a company. There is no table that contains information for only the new employees, so you will have to use data from the following two tables.

Table

Relevant Columns

Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code

FirstName, LastName

Sasuser.Staffmaster lists information for all existing employees

FirstName, LastName

The relationship between these two tables is shown in the diagram below:

Example: EXCEPT Operator

The intersection of these two tables includes information for all existing employees who have had changes in job code or salary. The shaded portion, the portion of Sasuser.Staffchanges that does not overlap with Sasuser.Staffmaster, includes information for the people that you want: new employees.

To separate the new employees from the existing employees in Sasuser.Staffchanges, you create a set operation that displays all rows from the first table (Sasuser.Staffchanges) that do not exist in the second table (Sasuser.Staffmaster). The following PROC SQL step solves the problem:

  proc sql;
     select firstname, lastname
        from sasuser.staffchanges
     except all 
     select firstname, lastname
        from sasuser.staffmaster;

This PROC SQL set operation includes the operator EXCEPT and the keyword ALL. Although you do not want the output to contain duplicate rows, you already know that there are no duplicates in these two tables. Therefore, ALL is specified to prevent PROC SQL from making an extra pass through the data, which speeds up the processing of this query.

PROC SQL compares only the columns that are specified in the SELECT clauses, and these columns are compared in the order in which they are specified. The output displays the first and last names of the two new employees.

Example: EXCEPT Operator

Note

In a set operation that uses the EXCEPT operator, the order in which the tables are listed in the SELECT statement makes a difference. If the tables in this example were listed in the opposite order, the output would display all existing employees who have had no changes in salary or job code. Δ

Example: EXCEPT Operator in an In-Line View

This example is a variation of the preceding set operation. Suppose you want to display the number of existing employees who have had no changes in salary or job code. Once again, the query uses the following tables and columns.

Table

Relevant Columns

Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code

FirstName, LastName

Sasuser.Staffmaster lists information for all existing employees

FirstName, LastName

The following PROC SQL query solves this problem:

  proc sql;
     select count(*) label='No. of Persons'
        from (select EmpID
                from sasuser.staffmaster
              except all
              select EmpID
                 from sasuser.staffchanges);

This PROC SQL query uses

  • the COUNT function with an asterisk (*) as an argument to count the number of employee IDs returned from the set operation

  • the set operator EXCEPT within an in-line view.

The in-line view returns a virtual table that contains employees who have had no changes in salary or job code. This virtual table is then passed to the COUNT(*) summary function, which counts the number of rows in the virtual table. The output shows that there are 144 existing employees who have had no changes in salary or job code.

Example: EXCEPT Operator in an In-Line View

Using the INTERSECT Set Operator

The set operator INTERSECT does both of the following:

  • selects unique rows that are common to both tables

  • overlays columns.

Using the INTERSECT Set Operator

The following example demonstrates how INTERSECT works when used alone and with the keywords ALL and CORR.

Using the INTERSECT Operator Alone

The INTERSECT operator compares and overlays columns in the sameway as the EXCEPT operator, by column position instead of column name. However, INTERSECT selects rows differently and displays in output the unique rows that are common to both tables. The following PROC SQL set operation uses the INTERSECT operator to combine the tables One and Two, which were introduced previously:

Using the INTERSECT Operator Alone

Tables One and Two have only one unique row in common and this row is displayed in the output. (This is the same row that was eliminated in the earlier example that contained the EXCEPT operator.)

Using the Keyword ALL with the INTERSECT Operator

Adding the keyword ALL to the preceding PROC SQL query prevents PROC SQL from making an extra pass through the data. If there were any rows common to tables One and Two that were duplicates of other common rows, they would also be included in output. However, as you have seen, there is only one common row in these tables. The modified PROC SQL query, the tables One and Two, and the output are shown below:

Using the Keyword ALL with the INTERSECT Operator

As before, there is just one row of output.

Using the Keyword CORR with the INTERSECT Operator

To display the unique rows that are common to the two tables based on the column name instead of the column position, add the CORR keyword to the PROC SQL set operation. The modified query, the tables One and Two, and the output are shown below:

Using the Keyword CORR with the INTERSECT Operator

X is the only column name that is common to both tables, so X is the only column that PROC SQL examines and displays in the output. In the first pass, PROC SQL eliminates the rows that are duplicated within each table: the second and third rows in table One contain the same value for X as the first row, and the fourth row in table Two contains the same value for X as the third row. In the second pass, PROC SQL eliminates any rows that are not common across tables: the fourth and fifth rows in table One and the fifth row in table Two do not have a matching value of X in the other table. The output displays the three rows with unique values of X that are also common to both tables.

Using the Keywords ALL and CORR with the INTERSECT Operator

If the keywords ALL and CORR are used together, the INTERSECT operator will display all unique and nonunique (duplicate) rows that are common to the two tables, based on columns that have the same name. The modified query, the tables One and Two, and the output are shown below:

Using the Keywords ALL and CORR with the INTERSECT Operator

PROC SQL examines and displays only the column with the same name, X. There are three common rows across the two tables, which are highlighted above, and these are the three rows that are displayed in the output.

Note that each of the tables contains at least one other row that duplicates a value of X in one of the common rows. For example, in the second and third rows in table One, the value of X is 1, as in one of the common rows. However, in order to be considered a common row and to be included in the output, every duplicate row in one table must have a separate duplicate row in the other table. In this example, there are no rows that have duplicate values and that are also common across tables. Therefore, in this example, the set operation with the keywords ALL and CORR generates the same output as with the keyword CORR alone.

Example: INTERSECT Operator

Now that you have seen how the INTERSECT set operator works with very small tables, we can use INTERSECT in a realistic business problem. Suppose you want to display the names of the existing employees who have changed their salary or job code. (This query is the opposite of the query that you solved with the EXCEPT operator.)

Once again, you will use the following tables.

Table

Relevant Columns

Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code

FirstName, LastName

Sasuser.Staffmaster lists information for all existing employees

FirstName, LastName

The relationship between these two tables is shown in the diagram below:

Example: INTERSECT Operator

As shown in the earlier example with EXCEPT, the intersection of these two tables includes information for all existing employees who have had changes in job code or salary. It is the intersection of these two tables, shaded above, that you want to display.

To display the unique rows that are common to both tables, you use a PROC SQL set operation that contains INTERSECT. It is known that these tables contain no duplicates, so ALL is used to speed up query processing. The PROC SQL set operation is shown below:

  proc sql;
     select firstname, lastname
        from sasuser.staffchanges
     intersect all 
     select firstname, lastname
        from sasuser.staffmaster;

Note

In this PROC SQL step, which contains just one INTERSECT set operator, the order in which you list the tables in the SELECT statement does not make a difference. However, in a more complex PROC SQL step that contains multiple stacked INTERSECT set operators, it is important to think through the table order carefully, depending on when you want the non-matches to be eliminated.

The output shows that there are four existing employees who have changed their salary or job code. Δ

Example: INTERSECT Operator

Using the UNION Set Operator

The set operator UNION does both of the following:

  • selects unique rows from both tables together

  • overlays columns.

Using the UNION Set Operator

The following example demonstrates how UNION works when used alone and with the keywords ALL and CORR.

Using the UNION Operator Alone

To display all rows from the tables One and Two that are unique in the combined set of rows from both tables, use a PROC SQL set operation that includes the UNION operator:

Using the UNION Operator Alone

With the UNION operator, PROC SQL first concatenates and sorts the rows from the two tables, and eliminates any duplicate rows. In this example, two rows are eliminated: the second row in table One is a duplicate of the first row, and the fourth row in table Two matches the fifth row in table One. All remaining rows, the unique rows, are included in the output. The columns are overlaid by position.

Using the Keyword ALL with the UNION Operator

When the keyword ALL is added to the UNION operator, the output displays all rows from both tables, both unique and duplicate. The modified PROC SQL set operation, the tables One and Two, and the new output are shown below:

Using the Keyword ALL with the UNION Operator

When the ALL keyword is used, PROC SQL does not remove duplicates or sort the rows. The output now includes the two duplicate rows that were eliminated in the previous example: the second row in table One and the fourth row in table Two. Note that the rows are in a different order in this output than they were in the output from the previous set operation.

Using the Keyword CORR with the UNION Operator

To display all rows from the tables One and Two that are unique in the combined set of rows from both tables, based on columns that have the same name rather than the same position, add the keyword CORR after the set operator. The modified query, the tables One and Two, and the output are shown below:

Using the Keyword CORR with the UNION Operator

X is the only column name that is common to both tables, so X is the only column that PROC SQL examines and displays in the output. In the combined set of rows from the two tables, there are duplicates of the values 1, 2, and 3, and these duplicate rows are eliminated from the output. The output displays the six unique values of X.

Using the Keywords ALL and CORR with the UNION Operator

If the keywords ALL and CORR are used together, the UNION operator will display all rows in the two tables both unique and duplicate, based on the columns that have the same name. In this example, the output displays all 12 values for X, the one column that has the same name in both tables.

Using the Keywords ALL and CORR with the UNION Operator

Example: UNION Operator

The UNION operator can be used to solve a realistic business problem. Suppose you are generating a report based on data from a health clinic. You want to display the results of individual patient stress tests taken in 1998, followed by the results from stress tests taken in 1999. To do this, you will use the UNION operator to combine the tables Sasuser.Stress98 and Sasuser.Stress99. These two tables are similar in structure:

  • both tables contain nine columns that have the same names

  • each row contains data for an individual patient.

You are not sure whether the tables contain duplicate records, but you do not want duplicates in your output. Because the tables have the same column structure, you can overlay the columns by position and the CORR keyword is not necessary. The PROC SQL set operation and output are shown below:

Example: UNION Operator

Tip

If you can determine that these tables have no duplicate records, you could add the keyword ALL to speed up processing by avoiding an extra pass through the data.

Example: UNION Operator and Summary Functions

We can demonstrate another realistic business problem, to see how summary functions can be used with a set operator (in this case, UNION). Suppose you want to display the following summarized data for members of a frequent-flyer program: total points earned, total points used, and total miles traveled. All three values can be calculated from columns in the table Sasuser.Frequentflyers by using summary functions.

You might wonder why set operations are needed when only one table is involved. If you wanted to display the three summarized values horizontally, in three separate columns, you could solve the problem without a set operation, using the following simple SELECT statement:

proc sql;
   select sum(pointsearned) format=comma12.
          label='Total Points Earned',
          sum(pointsused) format=comma12.
          label='Total Points Used',
          sum(milestraveled) format=comma12.
          label='Total Miles Traveled'
      from sasuser.frequentflyers;
Example: UNION Operator and Summary Functions

Assume, however, that you want the three values to be displayed vertically in a single column. To generate this output, you create three different queries on the same table, and then use two UNION set operators to combine the three query results:

proc sql;
title 'Points and Miles Traveled';
title2 'by Frequent Flyers';
   select 'Total Points Traveled:',
          sum(MilesTraveled) format=comma12.
      from sasuser.frequentflyers
   union 
   select 'Total Points Earned:',
          sum(PointsEarned) format=comma12.
      from sasuser.frequentflyers 
   union 
   select 'Total Points Used:',
          sum(PointsUsed) format=comma12.
      from sasuser.frequentflyers;

Each SELECT clause defines two columns: a character constant as a label and the summarized value. The output is shown below.

Example: UNION Operator and Summary Functions

Note

The preceding program reads the same table three times, so it is not the most efficient way to solve this problem. Δ

Using the OUTER UNION Set Operator

The set operator OUTER UNION concatenates the results of the queries by

  • selecting all rows (both unique and nonunique) from both tables

  • not overlaying columns.

Using the OUTER UNION Set Operator

We can demonstrate how OUTER UNION works when used alone and with the keyword CORR. The ALL keyword is not used with OUTER UNION because this operator's default action is to include all rows in output.

Using the OUTER UNION Operator Alone

Suppose you want to display all rows from both of the tables One and Two, without overlaying columns. The PROC SQL set operation that includes the OUTER UNION operator, the two tables, and the output are shown below:

Using the OUTER UNION Operator Alone

In the output, the columns have not been overlaid. Instead, all four columns from both tables are displayed. Each row of output contains missing values in the two columns that correspond to the other table.

Using the Keyword CORR with the OUTER UNION Operator

The output from the preceding set operation contains two columns with the same name. To overlay the columns with a common name, add the CORR keyword to the set operation:

Using the Keyword CORR with the OUTER UNION Operator

The output from the modified set operation contains only three columns, because the two columns named X are overlaid.

Example: OUTER UNION Operator

There are many business situations that require two or more tables to be concatenated. For example, suppose you want to display the employee numbers, job codes, and salaries of all mechanics working for an airline. The mechanic job has three levels and there is a separate table containing data for the mechanics at each level: Sasuser.Mechanicslevel1, Sasuser.Mechanicslevel2, and Sasuser.Mechanicslevel3. These tables all contain the same three columns.

The following PROC SQL step uses two OUTER UNION operators to concatenate the tables, and the CORR keyword to overlay the columns that have common names:

Example: OUTER UNION Operator

Comparing Outer Unions and Other SAS Techniques

A PROC SQL set operation that uses the OUTER UNION operator is just one SAS technique that you can use to concatenate tables. Program 1, below, is the PROC SQL set operation that was shown earlier in this chapter. Program 2 uses a different SAS technique to concatenate the hypothetical tables One and Two.

Program 1: PROC SQL OUTER UNION Set Operation with CORR

  proc sql;
     create table three as
        select * from one
        outer union corr
        select * from two;
  quit;

Program 2: DATA Step, SET Statement, and PROC PRINT Step

  data three;
     set one two;
  run;
  proc print data=three noobs;
  run;

These two programs create the same table as output, as shown below.

Program 2: DATA Step, SET Statement, and PROC PRINT Step

When tables have a same-named column, the PROC SQL outer union will not produce the same output unless the keyword CORR is also used. CORR causes the same-named columns (in this example, the two columns named X) to be overlaid; without CORR, the OUTER UNION operator will include both of the same-named columns in the result set. The DATA step program will generate only one column X.

The two concatenation techniques shown above also vary in efficiency. A PROC SQL set operation generally requires more computer resources but might be more convenient and flexible than the DATA step equivalent.

Summary

This section contains the following:

  • a text summary of the material taught in this chapter

  • syntax for statements and option

  • a sample program

  • points to remember.

Text Summary

Understanding Set Operations

A set operation combines tables or views vertically (one on top of the other) by combining the results of two queries. A set operation is a SELECT statement that contains

  • two groups of query clauses (each group beginning with a SELECT clause)

  • one of the set operators EXCEPT, INTERSECT, UNION, and OUTER UNION

  • one or both of the keywords ALL and CORR (CORRESPONDING) as modifiers.

A single SELECT statement can contain multiple set operations.

When processing a set operation that displays only unique rows (a set operation that contains the set operator EXCEPT, INTERSECT, or UNION), PROC SQL makes two passes through the data, by default. For set operations that display both unique and duplicate rows, only one pass through the data is required.

For the set operators EXCEPT, INTERSECT, and UNION, columns are overlaid based on the relative position of the columns in the SELECT clause rather than by column name. In order to be overlaid, columns in the same relative position in the two SELECT clauses must have the same data type.

One or both keywords can be used to modify the default action of a set operator.

Using the EXCEPT Set Operator

The set operator EXCEPT selects unique rows from the first table (the table specified in the first query) that are not found in the second table (the table specified in the second query) and overlays columns. This set operation can be modified by using either or both of the keywords ALL and CORR.

Using the INTERSECT Set Operator

The set operator INTERSECT selects unique rows that are common to both tables and overlays columns. This set operation can be modified by using either or both of the keywords ALL and CORR.

Using the UNION Set Operator

The set operator UNION selects unique rows from both tables together and overlays columns. This set operation can be modified by using either or both of the keywords ALL and CORR.

Using the OUTER UNION Set Operator

The set operator OUTER UNION concatenates the results of two queries by selecting all rows (both unique and nonunique) from both tables and not overlaying columns. This set operation can be modified by using the keyword CORR.

Comparing Outer Unions and Other SAS Techniques

A PROC SQL set operation that uses the OUTER UNION set operator is not the only way to concatenate tables in SAS. Other SAS techniques can be used, such as a program that consists of a DATA step, a SET statement, and a PROC PRINT step.

Syntax

PROC SQL;

  SELECT column-1<, … column-n>

     FROM table-1 | view-1< , … table-n | view-n >

     <optional query clauses >

  set-operator <ALL> <CORR >

  SELECT column-1 <, … column-n>

     FROM table-1 | view-1< , … table-n | view-n >

     <optional query clauses>;

QUIT;

Sample Program

  proc sql;
     select firstname, lastname
        from sasuser.staffchanges
     intersect all
     select firstname, lastname
        from sasuser.staffmaster;
  quit;

Points to Remember

  • Regardless of the number of set operations in a SELECT statement, the statement contains only one semicolon, which is placed after the last group of query clauses.

  • In order to be overlaid, columns must have the same data type.

Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. Which statement is false with respect to a set operation that uses the EXCEPT, UNION, or INTERSECT set operator without a keyword?

    1. Column names in the result set are determined by the first table.

    2. To be overlaid, columns must be of the same data type.

    3. To be overlaid, columns must have the same name.

    4. By default, only unique rows are displayed in the result set.

  2. The keyword ALL cannot be used with which of the following set operators?

    1. EXCEPT

    2. INTERSECT

    3. UNION

    4. OUTER UNION

  3. Which PROC SQL step combines the tables Summer and Winter to produce the output displayed below?

    Quiz
    1.    proc sql;
            select *
               from summer
            intersect all
            select *
               from winter;
    2.   proc sql;
           select *
              from summer
           outer union
           select *
              from winter;
    3.   proc sql;
           select *
              from summer
           union corr
           select *
              from winter;
    4.   proc sql;
           select *
              from summer
           union
           select *
              from winter;
  4. Which PROC SQL step combines tables but does not overlay any columns?

    1.   proc sql;
           select *
              from groupa
           outer union
           select *
              from groupb;
    2.   proc sql;
           select *
              from groupa as a
           outer union corr
           select *
              from groupb as b;
    3.   proc sql;
           select coalesce(a.obs, b.obs)
                  label='Obs', med, duration
              from groupa as a
              full join
              groupb as b
              on a.obs=b.obs;
    4.   proc sql;
           select *
              from groupa as a
           intersect
           select *
              from groupb as b;
  5. Which statement is false regarding the keyword CORRESPONDING?

    1. It cannot be used with the keyword ALL.

    2. It overlays columns by name, not by position.

    3. When used in EXCEPT, INTERSECT, and UNION set operations, it removes any columns not found in both tables.

    4. When used in OUTER UNION set operations, it causes same-named columns to be overlaid.

  6. Which PROC SQL step generates the following output from the tables Dogs and Pets?

    Quiz
    1.   proc sql;
           select name, price
              from pets
           except all
           select *
              from dogs;
    2.   proc sql;
           select name, price
              from pets
           except
           select *
              from dogs;
    3.   proc sql;
           select name, price
              from pets
           except corr all
           select *
              from dogs;
    4.   proc sql;
           select *
              from dogs
           except corr
           select name, price
              from pets;
  7. The PROG1 and PROG2 tables list students who took the PROG1 and PROG2 courses, respectively. Which PROC SQL step will give you the names of the students who took only the PROG1 class?

    Quiz
    1.   proc sql;
           select fname, lname
              from prog1
           intersect
           select fname, lname
              from prog2;
    2.   proc sql;
           select fname, lname
              from prog1
           except all
           select fname, lname
              from prog2;
    3.   proc sql;
           select *
              from prog2
           intersect corr
           select *
              from prog1;
    4.   proc sql;
           select *
              from prog2
           union
           select *
              from prog1;
  8. Which PROC SQL step will return the names of all the students who took PROG1, PROG2, or both classes?

    Quiz
    1.   proc sql;
           select fname, lname
              from prog1
           intersect
           select fname, lname
              from prog2;
    2.   proc sql;
           select fname, lname
              from prog1
           outer union corr
           select fname, lname
              from prog2;
    3.   proc sql;
           select fname, lname
              from prog1
           union
           select fname, lname
              from prog2;
    4.   proc sql;
           select fname, lname
              from prog1
           except corr
           select fname, lname
              from prog2;
  9. Which PROC SQL step will return the names of all the students who took both the PROG1 and PROG2 classes?

    Quiz
    1.   proc sql;
           select fname, lname
              from prog1
           union
           select fname, lname
              from prog2;
    2.   proc sql;
           select fname, lname
              from prog1
           except corr
           select fname, lname
              from prog2;
    3.   proc sql;
           select fname, lname
              from prog1
           intersect all
           select fname, lname
              from prog2;
    4.   proc sql;
           select fname, lname
              from prog1
           union corr
           select fname, lname
              from prog2;
  10. Which PROC SQL step will generate the same results as the following DATA step?

    Quiz
    1.   proc sql;
           select fname, lname
              from prog1
           outer union corr
           select fname, lname
              from prog2
           order by lname;
    2.   proc sql;
           select fname, lname
              from prog1
           union
           select fname, lname
              from prog2
           order by lname;
    3.   proc sql;
           select fname, lname
              from prog2
           outer union
           select fname, lname
              from prog1
           order by lname;
    4.   proc sql;
           select fname, lname
              from prog2
           union corr
           select fname, lname
              from prog1
           order by lname;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset