Using the EXCEPT Set Operator

A Brief Overview

The set operator EXCEPT does both of the following:
  • selects unique rows from the first table (the table specified in the first query) that are not found in the second table
  • overlays columns
Figure 4.1 EXCEPT Set Operator Relationship
EXCEPT Set Operator Relationship

Example: Using the EXCEPT Operator Alone

Suppose you want to display the unique rows in table Col1 that are not found in table Col2. The PROC SQL set operation that includes the EXCEPT operator, the tables Col1 and Col2, and the output of the set operation are shown below.
proc sql;
   select *
      from certadv.col1 except
   select *
      from certadv.col2;
quit;
Output 4.3 PROC SQL Query Result Illustration
Tables Certadv.Col1, Certadv.Col2, and Output
The set operator EXCEPT overlays columns by their position. In this output, the following columns are overlaid:
  • the first columns, Col1.X and Col2.X, both of which are numeric
  • the second columns, Col1.A and Col2.B, both of which are character
The column names from table Col1 are used, so the second column of output is named A rather than B.
In the first pass, PROC SQL eliminates any duplicate rows from the tables. As shown below, there is one duplicate row: in table Col1, the second row is a duplicate of the first row. All remaining rows in table Col1 are still candidates in PROC SQL's selection process.
proc sql;
   select *
      from certadv.col1 except
   select *
      from certadv.col2;
quit;
Output 4.4 Tables Certadv.Col1 and Certadv.Col2
Tables Col1 and Col2
In the second pass, PROC SQL identifies any rows in table Col1 for which there is a matching row in table Col2 and eliminates them. The one matching row in the two tables, as shown below, is eliminated.
proc sql;
   select *
      from certadv.col1 except
   select *
      from certadv.col2;
quit;
Output 4.5 Tables Certadv.Col1 and Certadv.Col2
Tables Col1 and Col2
The five remaining rows in table Col1, the unique rows, are displayed in the output.
Output 4.6 PROC SQL Query Result: Overlaying Tables
PROC SQL Query Result: Overlaying Tables

Example: Using the Keyword ALL with the EXCEPT Operator

To select all rows in the first table, both unique and duplicate, that do not have a matching row in the second table, add the keyword ALL after the EXCEPT set operator. The modified PROC SQL set operation, the tables Col1 and Col2, and the output are shown below.
proc sql;
   select *
      from certadv.col1 except all
   select *
      from certadv.col2;
quit;
Output 4.7 PROC SQL Query Result Illustration
Tables Certadv.Col1, Certadv.Col2, and Output
The output now contains six rows. PROC SQL has again eliminated the one row in table Col1, the fifth row, that has a matching row in table Col2, the fourth row. Remember that when the keyword ALL is used with the EXCEPT operator, PROC SQL does not make an extra pass through the data to remove duplicate rows within table Col1. Therefore, the second row in table Col1, which is a duplicate of the first row, is now included in the output.

Example: Using the Keyword CORR with the EXCEPT Operator

Add the keyword CORR after the set operator to display both of the following:
  • only columns that have the same name
  • all unique rows in the first table that do not appear in the second table
The modified PROC SQL set operation, the tables Col1 and Col2, and the output are shown below:
proc sql;
   select *
      from certadv.col1 except corr
   select *
      from certadv.col2;
quit;
Output 4.8 PROC SQL Query Result Illustration
Tables Certadv.Col1, Certadv.Col2, and Output
X is the only column that has the same name in both tables, so X is the only column that PROC SQL examines and displays in the output.
In the first pass, PROC SQL eliminates the second and third rows of table Col1 from the output because they are not unique within the table. They contain values of X that duplicate the value of X in the first row of table Col1. In the second pass, PROC SQL eliminates the first, fourth, and fifth rows of table Col1 because each contains a value of X that matches a value of X in a row of table Col2.
The output displays the two remaining rows in table Col1, the rows that are unique in table Col1 and that do not have a row in table Col2 that has a matching value of X.

Example: Using the Keywords ALL and CORR with the EXCEPT Operator

If the keywords ALL and CORR are used together, the EXCEPT operator displays all unique and duplicate rows in the first table that do not appear in the second table, and overlays and displays only columns that have the same name.
The modified PROC SQL set operation, the tables Col1 and Col2, and the output are shown below:
proc sql;
   select *
      from certadv.col1 except all corr
   select *
      from certadv.col2;
quit;
Output 4.9 PROC SQL Query Result Illustration
Tables Certadv.Col1, Certadv.Col2, and Output
Once again, PROC SQL examines and displays only the column that has the same name in the two tables: X. Because the keyword ALL is used, PROC SQL does not eliminate any duplicate rows in table Col1. Therefore, the second and third rows in table Col1, which are duplicates of the first row in table Col1, appear in the output. PROC SQL does eliminate the first, fourth, and fifth rows in table Col1 from the output because for each one of these three rows there is a corresponding row in table Col2 that has a matching value of X.
When the keyword ALL is used with the EXCEPT operator, a row in table Col1 cannot be eliminated from the output unless it has a separate matching row in table Col2. Table Col1 contains three rows in which the value of X is 1, but table Col2 contains only one row in which the value of X is 1. That one row in table Col2 causes the first of the three rows in table Col1 that have a matching value of X to be eliminated from the output. However, table Col2 does not have two additional rows in which the value of X is 1. Therefore, the other two rows in table Col1 are not eliminated, and do appear in the output.

Example: EXCEPT Operator

Suppose you want to display the names of all new employees of a company. Because no table exists that contains information for only the new employees, you use data from the following two tables.
Table
Relevant Columns
Certadv.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code.
FirstName, LastName
Certadv.Staffmaster lists information for all existing employees.
FirstName, LastName
The relationship between these two tables is shown in the diagram below:
Figure 4.2 EXCEPT Operator Relationship
EXCEPT Operator Relationship Using a Venn Diagram
The intersection of these two tables includes information for all existing employees who have had changes in job code or salary. The shaded portion, the portion of Certadv.Staffchanges that does not overlap with Certadv.Staffmaster, includes information for the people that you want: new employees.
The following PROC SQL step separates the new employees from the existing employees in Certadv.Staffchanges to create a set operation that displays all rows from Certadv.Staffchanges that do not exist in the Certadv.Staffmaster.
proc sql;
   select firstname, lastname
      from certadv.staffchanges except all
   select firstname, lastname
      from certadv.staffmaster;
quit;
This PROC SQL set operation includes the operator EXCEPT and the keyword ALL. Although you do not want the output to contain duplicate rows, you already know that there are no duplicates in these two tables. Therefore, ALL is specified to prevent PROC SQL from making an extra pass through the data. This speeds up the processing of this query.
PROC SQL compares only the columns that are specified in the SELECT clauses, and these columns are compared in the order in which they are specified. The output displays the first and last names of the two new employees.
Output 4.10 PROC SQL Query Result: Using EXCEPT Operator and Keyword ALL
PROC SQL Query Result: Using EXCEPT Operator and ALL Keyword
Note: In a set operation that uses the EXCEPT operator, the order in which the tables are listed in the SELECT statement makes a difference. If the tables in this example were listed in the opposite order, the output would display all existing employees who have had no changes in salary or job code.
Last updated: October 16, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset