Chapter 13. Combining SAS Data Sets

Overview

Introduction

In SAS programming, a common task is to combine observations from two or more data sets into a new data set. Using the DATA step, you can combine data sets in several ways, including the following:

Method of Combining

Illustration

One-to-one reading

Creates observations that contain all of the variables from each contributing data set.

Combines observations based on their relative position in each data set.

Statement: SET

Introduction

Concatenating

Appends the observations from one data set to another.

Statement: SET

Introduction

Appending

Appending adds the observations in the second data set directly to the end of the original data set.

Procedure: APPEND

Introduction

Interleaving

Intersperses observations from two or more data sets, based on one or more common variables.

Statements: SET, BY

Introduction

Match-merging

Matches observations from two or more data sets into a single observation in a new data set according to the values of a common variable.

Statements: MERGE, BY

Introduction
Introduction

This chapter shows you how to combine SAS data sets using one-to-one reading, concatenating, interleaving, and match-merging. When you use the DATA step to combine data sets, you have a high degree of control in creating and manipulating data sets.

Objectives

In this chapter, you learn to

  • perform one-to-one reading of data sets

  • concatenate data sets

  • append data sets

  • interleave data sets

  • match-merge data sets

  • rename any like-named variables to avoid overwriting values

  • select only matched observations, if desired

  • predict the results of match-merging.

One-to-One Reading

If you completed Chapter 12, "Reading SAS Data Sets," on page 345, you learned how to use the SET statement to read an existing SAS data set. You can also use multiple SET statements in a DATA step to combine data sets. This is called one-to-one reading. In one-to-one reading, you can read different data sets, or you can read the same data set more than once, as if you were reading from separate data sets.

How One-to-One Reading Selects Data

When you perform one-to-one reading,

  • the new data set contains all the variables from all the input data sets. If the data sets contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets.

  • the number of observations in the new data set is the number of observations in the smallest original data set. Observations are combined based on their relative position in each data set. That is, the first observation in one data set is joined with the first observation in the other, and so on. The DATA step stops after it has read the last observation from the smallest data set.

How One-to-One Reading Selects Data

How One-to-One Reading Works

Let's look at a simple case of one-to-one reading.

How One-to-One Reading Works
  1. The first SET statement reads one observation from data set C.

    How One-to-One Reading Works
  2. Then the second SET statement reads one observation from data set D. The value for Num in data set D overwrites the value for Num in data set C.

    How One-to-One Reading Works
  3. Next, the first SET statement reads the second observation from data set C.

    How One-to-One Reading Works
  4. Finally, the second SET statement reads the second observation from data set D, overwriting the value for Num in data set C. Because this is the last observation in the smallest data set, processing stops. The DATA step does not read the third observation in data set C.

    How One-to-One Reading Works

Now let's see how you might use one-to-one reading.

Example

Suppose you have basic patient data (ID, sex, and age) in the data set Clinic.Patients and want to combine it with other patient data (height and weight) for patients under age 60. The height and weight data is stored in the data set Clinic.Measure. Both data sets are sorted by the variable ID.

Notice that Clinic.Patients contains 7 observations in which the patient age is less than 60, and Clinic.Measure contains 6 observations.

Example

To subset observations from the first data set and combine them with observations from the second data set, you can submit the following program:

data clinic.one2one;
   set clinic.patients;
   if age<60;
   set clinic.measure;
run;

The resulting data set, Clinic.One2one, contains 6 observations (the number of observations read from the smallest data set, here Clinic.Measure). The last observation in Clinic.Patients is not read.

Example

Concatenating

Another way to combine SAS data sets with the SET statement is concatenating, which appends the observations from one data set to another data set. To concatenate SAS data sets, you specify a list of data set names in the SET statement.

How Concatenating Selects Data

When a program concatenates data sets, all of the observations are read from the first data set listed in the SET statement. Then all of the observations are read from the second data set listed, and so on, until all of the listed data sets have been read. The new data set contains all of the variables and observations from all of the input data sets.

How Concatenating Selects Data

Notice that A and C contain a common variable named Num:

  • Both instances of Num (or any common variable) must have the same type attribute, or SAS stops processing the DATA step and issues an error message stating that the variables are incompatible.

  • However, if the length attribute is different, SAS takes the length from the first data set that contains the variable. In this case, the length of Num in A determines the length of Num in Concat.

  • The same is true for the label, format, and informat attributes: If any of these attributes are different, SAS takes the attribute from the first data set that contains the variable with that attribute.

Example

The following DATA step creates Clinic.Concat by concatenating Clinic.Therapy1999 and Clinic.Therapy2000.

data clinic.concat;
   set clinic.therapy1999 clinic.therapy2000;
run;

Below is the listing of Clinic.Concat. The first 12 observations were read from Clinic.Therapy1999, and the last 12 observations were read from Clinic.Therapy2000.

Example

Appending

Another way to combine SAS data sets is to append one data set to another using the APPEND procedure. Although appending and concatenating are similar, there are some important differences between the two methods. Whereas the DATA step creates an entirely new data set when concatenating, PROC APPEND simply adds the observations of one data set to the end of a "master" (or BASE) data set. SAS does not create a new data set nor does it read the base data set when executing the APPEND procedure.

To append SAS data sets, you specify a BASE= data set, which is the data set to which observations are added and then specify a DATA= data set, which is the data set containing the observations that are added to the base data set. The data set specified with DATA= is the only one of the two data sets that SAS actually reads.

For example, the following PROC APPEND statement appends the observations in data set B to the end of data set A:

Appending

Requirements for the APPEND Procedure

The requirements for appending one data set to another are as follows:

  • Only two data sets can be used at a time in one step.

  • The observations in the base data set are not read.

  • The variable information in the descriptor portion of the base data set cannot change.

Notice that the final data set is the original data set and that no new data set was created.

Example

The following PROC APPEND statement appends the data set Clinic.Append to the base data set Clinic.Therapy. Notice that the two data sets are like-structured data sets: that is, both data sets have the same variable information.

proc append base=Clinic.Therapy data=Clinic.Append;
run;

Below is the listing of Clinic.Therapy. The first 12 observations already existed in Clinic.Therapy, and the last 12 observations were read from Clinic.Append and added to Clinic.Therapy.

Example

Using the FORCE Option with Unlike-Structured Data Sets

In order to use PROC APPEND with data sets that have unmatching variable definitions, you can use the FORCE option in the PROC APPEND statement.

The FORCE option is needed when the DATA= data set contains variables that meet any one of the following criteria:

  • They are not in the BASE= data set.

  • They are variables of a different type (for example, character or numeric).

  • They are longer than the variables in the BASE= data set.

If the length of a variable is longer in the DATA= data set than in the BASE= data set, SAS truncates values from the DATA= data set to fit them into the length that is specified in the BASE= data set.

If the type of a variable in the DATA= data set is different than in the BASE= data set, SAS replaces all values for the variable in the DATA= data set with missing values and keeps the variable type of the variable specified in the BASE= data set.

If the BASE= data set contains a variable that is not in the DATA= data set, the observations are appended, but the observations from the DATA= data set have a missing value for the variable that was not present in the DATA= data set. The FORCE option is not necessary in this case.

Example

For example:


proc append base=Clinic.Therapy
            data=Clinic.Append force;
run;
Example

Interleaving

If you use a BY statement when you concatenate data sets, the result is interleaving. Interleaving intersperses observations from two or more data sets, based on one or more common variables.

To interleave SAS data sets, specify a list of data set names in the SET statement, and specify one or more BY variables in the BY statement.

How Interleaving Selects Data

When SAS interleaves data sets, observations in each BY group in each data set in the SET statement are read sequentially, in the order in which the data sets and BY variables are listed, until all observations have been processed. The new data set includes all the variables from all the input data sets, and it contains the total number of observations from all input data sets.

How Interleaving Selects Data

Example

The following DATA step creates Clinic.Interlv by interleaving Clinic.Therapy1999 and Clinic.Therapy2000.

data clinic.interlv;
   set clinic.therapy1999 clinic.therapy2000;
   by month;
run;

Below is the listing of Clinic.Interlv. Notice that, unlike the previous example, observations are interleaved by month instead of being concatenated.

Example

Match-Merging

So far in this chapter, you've combined data sets based on the order of the observations in the input data sets. But sometimes you need to combine observations from two or more data sets into a single observation in a new data set according to the values of a common variable. This is called match-merging.

When you match-merge, you use a MERGE statement rather than a SET statement to combine data sets.

How Match-Merging Selects Data

Generally speaking, during match-merging SAS sequentially checks each observation of each data set to see whether the BY values match, then writes the combined observation to the new data set.

How Match-Merging Selects Data

Basic DATA step match-merging produces an output data set that contains values from all observations in all input data sets. (You can add statements and options to select only observations that match for two or more specific input data sets.)

If an input data set doesn't have any observations for a particular value of the same-named variable, then the observation in the output data set contains missing values for the variables that are unique to that input data set.

How Match-Merging Selects Data
How Match-Merging Selects Data

Example

Suppose you have sorted the data sets Clinic.Demog and Clinic.Visit as follows:

Example

You can then submit this DATA step to create Clinic.Merged by merging Clinic.Demog and Clinic.Visit according to values of the variable ID.

data clinic.merged; 
   merge clinic.demog clinic.visit; 
   by id; 
run;
proc print data=clinic.merged;
run;

Notice that all observations, including unmatched observations and observations that have missing data, are written to the output data set.

Example

Example: Merge in Descending Order

The example above illustrates merging two data sets that are sorted in ascending order of the BY variable ID. To sort the data sets in descending order and then merge them, you can submit the following program.

proc sort data=clinic.demog;
   by descending  id;
run;
proc sort data=clinic.visit;
   by descending  id;
run;
data clinic.merged;
   merge clinic.demog clinic.visit;
   by descending  id;
run;
proc print data=clinic.merged;
run;

Notice that you specify the DESCENDING option in the BY statements in both the PROC SORT steps and the DATA step. If you omit the DESCENDING option in the DATA step, you generate error messages about improperly sorted BY variables.

Now the data sets are merged in descending order of the BY variable ID.

Example: Merge in Descending Order

Match-Merge Processing

The match-merging examples in this chapter are straightforward. However, match-merging can be more complex, depending on your data and on the output data set that you want to create. To predict the results of match-merges correctly, you need to understand how the DATA step performs match-merges.

When you submit a DATA step, it is processed in two phases:

  • the compilation phase, in which SAS checks the syntax of the SAS statements and compiles them (translates them into machine code). During this phase, SAS also sets up descriptor information for the output data set and creates the program data vector (PDV), an area of memory where SAS builds your data set, one observation at a time.

    Match-Merge Processing
  • the execution phase, in which the DATA step reads data and executes any subsequent programming statements. When the DATA step executes, data values are read into the appropriate variables in the program data vector. From here, the variables are written to the output data set as a single observation.

    Match-Merge Processing

The following pages cover DATA step processing in greater detail. In those pages, you learn

  • how the DATA step sets up the new output data set

  • what happens when variables in different data sets have the same name

  • how the DATA step matches observations in input data sets

  • what happens when observations don't match

  • how missing values are handled.

The Compilation Phase: Setting Up the New Data Set

To prepare to merge data sets, SAS

  • reads the descriptor portions of the data sets that are listed in the MERGE statement

  • reads the rest of the DATA step program

  • creates the program data vector (PDV) for the merged data set

  • assigns a tracking pointer to each data set that is listed in the MERGE statement.

If variables that have the same name appear in more than one data set, the variable from the first data set that contains the variable (in the order listed in the MERGE statement) determines the length of the variable.

The Compilation Phase: Setting Up the New Data Set

The illustration above shows match-merging during the compilation phase. After reading the descriptor portions of the data sets Clients and Amounts, SAS

  1. creates a program data vector for the new Claims data set. The program data vector contains all variables from the two data sets. Note that although Name appears in both input data sets, it appears in the program data vector only once.

  2. assigns tracking pointers to Clients and Amounts.

The Execution Phase: Match-Merging Observations

After compiling the DATA step, SAS sequentially match-merges observations by moving the pointers down each observation of each data set and checking to see whether the BY values match.

  • If Yes, the observations are written to the PDV in the order in which the data sets appear in the MERGE statement. Values of any same-named variable are overwritten by values of the same-named variable in subsequent data sets. SAS writes the combined observation to the new data set and retains the values in the PDV until the BY value changes in all the data sets.

    The Execution Phase: Match-Merging Observations
  • If No, SAS determines which of the values comes first and writes the observation that contains this value to the PDV. Then the contents of the PDV are written

    The Execution Phase: Match-Merging Observations

When the BY value changes in all the input data sets, the PDV is initialized to missing.

The Execution Phase: Match-Merging Observations

The DATA step merge continues to process every observation in each data set until it has processed all observations in all data sets.

Handling Unmatched Observations and Missing Values

By default, all observations that are written to the PDV, including observations that have missing data and no matching BY values, are written to the output data set. (If you specify a subsetting IF statement to select observations, then only those that meet the IF condition are written.)

  • If an observation contains missing values for a variable, then the observation in the output data set contains the missing values as well. Observations that have missing values for the BY variable appear at the top of the output data set.

    Handling Unmatched Observations and Missing Values
  • If an input data set doesn't have a matching BY value, then the observation in the output data set contains missing values for the variables that are unique to that input data set.

    Handling Unmatched Observations and Missing Values

Renaming Variables

Sometimes you might have same-named variables in more than one input data set. In this case, DATA step match-merging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data sets.

For example, Clinic.Demog contains the variable Date (date of birth), and Clinic.Visit also contains Date (date of the clinic visit in 1998). The DATA step below overwrites the date of birth with the date of the clinic visit.

data clinic.merged; 
   merge clinic.demog clinic.visit; 
   by id; 
run;
proc print data=clinic.merged;
run;

The following output shows the effects of overwriting the values of a variable in the Clinic.Merged data set. In most observations, the date is now the date of the clinic visit. In observation 11, the date is still the birth date because Clinic.Visit did not contain a matching ID value and did not contribute to the observation.

Renaming Variables

You now have a data set with values for Date that mean two different things: date of birth and date of clinic visit. Let's see how to prevent this problem.

To prevent overwriting, you can rename variables by using the RENAME= data set option in the MERGE statement.

In the following example, the RENAME= option renames the variable Date in Clinic.Demog to BirthDate, and it renames the variable Date in Clinic.Visit to VisitDate.

data clinic.merged;
   merge clinic.demog  (rename=(date=BirthDate))
         clinic.visit  (rename=(date=VisitDate));
   by id;
run;
proc print data=clinic.merged;
run;

The following output shows the effect of the RENAME= option.

Renaming Variables

Excluding Unmatched Observations

By default, DATA step match-merging combines all observations in all input data sets. However, you may want to select only observations that match for two or more specific input data sets.

Excluding Unmatched Observations

To exclude unmatched observations from your output data set, you can use the IN= data set option and the subsetting IF statement in your DATA step. In this case, you use

  • the IN= data set option to create and name a variable that indicates whether the data set contributed data to the current observation

  • the subsetting IF statement to check the IN= values and write to the merged data set only those observations that appear in the data sets for which IN= is specified.

Creating Temporary IN= Variables

Suppose you want to match-merge the data sets Clinic.Demog and Clinic.Visit and select only observations that appear in both data sets.

First, you use IN= to create two temporary variables, indemog and invisit. The IN= variable is a temporary variable that is available to program statements during the DATA step, but it is not included in the SAS data set that is being created.

The DATA step that contains the IN= options appears below. The first IN= creates the temporary variable indemog, which is set to 1 when an observation from Clinic.Demog contributes to the current observation. Otherwise, it is set to 0. Likewise, the value of invisit depends on whether Clinic.Visit contributes to an observation or not.

data clinic.merged;
   merge clinic.demog(in=indemog)
         clinic.visit(in=invisit
   rename=(date=BirthDate));
   by id;
run;
Creating Temporary IN= Variables

Selecting Matching Observations

Next, to select only observations that appear in both Clinic.Demog and Clinic.Visit, you specify a subsetting IF statement in the DATA step.

In the DATA step below, the subsetting IF statement checks the values of indemog and invisit and continues processing only those observations that meet the condition of the expression. Here the condition is that both Clinic.Demog and Clinic.Visit contribute to the observation. If the condition is met, the new observation is written to Clinic.Merged. Otherwise, the observation is deleted.

data clinic.merged;
   merge clinic.demog( in=indemog
                     rename=(date=BirthDate))
         clinic.visit (in=invisit
                      rename=(date=VisitDate));
   by id;
   if indemog=1 and invisit=1;
run;
proc print data=clinic.merged;
run;

In previous examples, Clinic.Merged contained 12 observations. In the output below, notice that only 10 observations met the condition in the IF expression.

Selecting Matching Observations

SAS evaluates the expression within an IF statement to produce a result that is either nonzero, zero, or missing. A nonzero or nonmissing result causes the expression to be true; a zero or missing result causes the expression to be false.

Thus, you can specify the subsetting IF statement from the previous example in either of the following ways. The first IF statement checks specifically for a value of 1. The second IF statement checks for a value that is neither missing nor 0 (which for IN= variables is always 1).

if  indemog=1 and  invisit=1;


if  indemog and  invisit;

Selecting Variables

As with reading raw data or reading SAS data sets, you can specify the variables you want to drop or keep by using the DROP= and KEEP= data set options.

For example, the DATA step below reads all variables from Clinic.Demog and all variables except Weight from Clinic.Visit. It then excludes the variable ID from Clinic.Merged after the merge processing is complete.

data clinic.merged  (drop=id);
   merge clinic.demog(in=indemog
                     rename=(date=BirthDate))
         clinic.visit(drop=weight in=invisit
                     rename=(date=VisitDate));
   by id;
   if indemog and invisit;
run;
proc print data=clinic.merged;
run;
Selecting Variables

Where to Specify DROP= and KEEP=

As you've seen in previous chapters, you can specify the DROP= and KEEP= options wherever you specify a SAS data set. When match-merging, you can specify these options in either the DATA statement or the MERGE statement, depending on whether or not you want to process values of the variables in that DATA step:

  • If you don't process certain variables and you don't want them to appear in the new data set, specify them in the DROP= option in the MERGE statement.

    merge clinic.demog(in=indemog
                      rename=(date=BirthDate))
          clinic.visit(drop=weight in=invisit
                      rename=(date=VisitDate));
  • If you do need to process a variable in the original data set (in a subsetting IF statement, for example), then you must specify the variable in the DROP= option in the DATA statement. Otherwise, the statement that uses the variable for processing causes an error.

    data clinic.merged (drop=id);

When used in the DATA statement, the DROP= option simply drops the variables from the new data set. However, they are still read from the original data set and are available for processing within the DATA step.

Additional Features

The DATA step provides a large number of other programming features for manipulating data when you combine data sets. For example, you can

  • use IF-THEN/ELSE logic to control processing based on one or more conditions

  • specify additional data set options

  • perform calculations

  • create new variables

  • process variables in arrays

  • use SAS functions

  • use special variables such as FIRST. and LAST. to control processing.

You can also combine SAS data sets in other ways:

  • You can perform one-to-one merging, which creates a data set that contains all of the variables and observations from each contributing data set. Observations are combined based on their relative position in each data set.

    One-to-one merging is the same as one-to-one reading, with two exceptions:

    • You use the MERGE statement instead of multiple SET statements.

    • The DATA step reads all observations from all data sets.

      data work.onemerge;
         merge clinic.demog clinic.visit;
      run;
  • You can perform a conditional merge, using DO loops or other conditional statements:

    data work.combine;
       set sales.pounds;
       do while(not(begin le date le last));
          set sales.rate;
       end;
       Dollars=(sales*1000)*rate;
    run;
    Additional Features
  • You can read the same data set in more than one SET statement:

    data work.combine(drop=totpay);
       if _n_=1 then do until(last);
          set sales.budget(keep=payroll) end=last;
          totpay+payroll;
          end;
       set sales.budget;
       Percent=payroll/totpay;
    run;

Chapter Summary

Text Summary

One-to-One Reading

You can combine data sets with one-to-one reading by including multiple SET statements in a DATA step. When you perform one-to-one reading, the new data set contains all the variables from all the input data sets. If the data sets contain same-named variables, the values that are read in from the last data set replace those that were read in from earlier ones. The number of observations in the new data set is the number of observations in the smallest original data set.

data one2one;
   set a;
   set b;
run;

Concatenating

To append the observations from one data set to another data set, you concatenate them by specifying the data set names in the SET statement. When SAS concatenates, data sets in the SET statement are read sequentially, in the order in which they are listed. The new data set contains all the variables and the total number of observations from all input data sets.

data concat;
   set a b;
run;

Appending

Another way to combine SAS data sets is to append one data set to another using the APPEND procedure. Although appending and concatenating are similar, there are some important differences between the two methods. The DATA step creates a new data set when concatenating, PROC APPEND adds the observations of one data set to the end of a "master" (or BASE) data set. SAS does not create a new data set nor does it read the base data set when executing the APPEND procedure.

Interleaving

If you use a BY statement when you concatenate data sets, the result is interleaving. Interleaving intersperses observations from two or more data sets, based on one or more common variables. Each input data set must be sorted or indexed in ascending order based on the BY variable(s). Observations in each BY group in each data set in the SET statement are read sequentially, in the order in which the data sets and BY variables are listed, until all observations have been processed. The new data set contains all the variables and the total number of observations from all input data sets.

data interlv;
   set a b;
   by num;
run;

Match-Merging

Sometimes you need to combine observations from two or more data sets into a single observation in a new data set according to the values of a same-named variable. This is match-merging, which uses a MERGE statement rather than a SET statement to combine data sets. Each input data set must be sorted or indexed in ascending order based on the BY variable(s). During match-merging, SAS sequentially checks each observation of each data set to see whether the BY values match, then writes the combined observation to the new data set.

data merged;
   merge a b;
   by num;
run;

Match-Merge Processing

To predict the results of match-merging correctly, you need to understand how the DATA step processes data in match-merges.

Compiling

To prepare to merge data sets, SAS

  1. reads the descriptor portions of the data sets that are listed in the MERGE statement

  2. reads the rest of the DATA step program

  3. creates the program data vector (PDV), an area of memory where SAS builds your data set one observation at a time

  4. assigns a tracking pointer to each data set that is listed in the MERGE statement.

If variables with the same name appear in more than one data set, the variable from the first data set that contains the variable (in the order listed in the MERGE statement) determines the length of the variable.

Executing

After compiling the DATA step, SAS sequentially match-merges observations by moving the pointers down each observation of each data set and checking to see whether the BY values match.

  • If Yes, the observations are written to the PDV in the order in which the data sets appear in the MERGE statement. Values of any same-named variable are overwritten by values of the same-named variable in subsequent data sets. SAS writes the combined observation to the new data set and retains the values in the PDV until the BY value changes in all the data sets.

  • If No, SAS determines which of the values comes first and writes the observation that contains this value to the PDV. Then the observation is written to the new data set.

When the BY value changes in all the input data sets, the PDV is initialized to missing. The DATA step merge continues to process every observation in each data set until it has processed all observations in all data sets.

Handling Unmatched Observations and Missing Values

All observations that are written to the PDV, including observations that have missing data and no matching BY values, are written to the output data set.

  • If an observation contains missing values for a variable, then the observation in the output data set contains the missing values as well. Observations that have missing values for the BY variable appear at the top of the output data set.

  • If an input data set doesn't have a matching BY value, then the observation in the output data set contains missing values for the variables that are unique to that input data set.

Renaming Variables

Sometimes you might have same-named variables in more than one input data set. In this case, match-merging overwrites values of the same-named variable in the first data set with values of the same-named variable in subsequent data sets. To prevent overwriting, use the RENAME= data set option in the MERGE statement to rename variables.

Excluding Unmatched Observations

By default, match-merging combines all observations in all input data sets. However, you might want to select only observations that match for two or more input data sets. To exclude unmatched observations, use the IN= data set option and the subsetting IF statement in your DATA step. The IN= data set option creates a variable to indicate whether the data set contributed data to the current observation. The subsetting IF statement then checks the IN= values and writes to the merged data set only observations that appear in the data sets for which IN= is specified.

Selecting Variable

You can specify the variables you want to drop or keep by using the DROP= and KEEP= data set options. When match-merging, you can specify these options in either the DATA statement or the MERGE statement, depending on whether or not you want to process values of the variables in that DATA step. When used in the DATA statement, the DROP= option simply drops the variables from the new data set. However, they are still read from the original data set and are available within the DATA step.

Syntax

One-to-One Reading

LIBNAME libref 'SAS-data-library';

DATA output-SAS-data-set;

       SET SAS-data-set-1;

       SET SAS-data-set-2;

RUN;

Concatenating

DATA output-SAS-data-set;

       SET SAS-data-set-1 SAS-data-set-2;

RUN;

Interleaving

PROC SORT DATA=SAS-data-set OUT=SAS-data-set;

       BY variable(s);

RUN;

DATA output-SAS-data-set;

       SET SAS-data-set-1 SAS-data-set-2;

       BY variable(s);

RUN;

Match-Merging

PROC SORT DATA=SAS-data-set OUT=SAS-data-set;

       BY variable(s);

RUN;

DATA output-SAS-data-set (DROP=variable(s) | KEEP=variable(s));

       MERGE SAS-data-set-1 SAS-data-set-2

       (RENAME=(old-variable-name=new-variable-name)

       IN= variable DROP=variable(s) | KEEP=variable(s));

     BY variable(s);

     IF expression;

RUN;

Sample Programs

One-to-One Reading

data clinic.one2one;
   set clinic.patients;
   if age<60;
   set clinic.measure;
run;

Concatenating

data clinic.concat;
   set clinic.therapy1999 clinic.therapy2000;
run;

Interleaving

data clinic.intrleav;
   set clinic.therapy1999 clinic.therapy2000;
   by month;
run;

Match-Merging

data clinic.merged(drop=id);
   merge clinic.demog(in=indemog
                     rename=(date=BirthDate))
         clinic.visit(drop=weight in=invisit
                     rename=(date=VisitDate));
   by id;
   if indemog and invisit;
run;

Points to Remember

  • You can rename any number of variables in each occurrence of the RENAME=option.

  • In match-merging, the IN= data set option can apply to any data set in the MERGE statement. The RENAME=, DROP=, and KEEP= options can apply to any data set in the DATA or MERGE statements.

  • Use the KEEP= option instead of the DROP= option if more variables are dropped than kept.

  • When you specify multiple data set options for a particular data set, enclose them in a single set of parentheses.

Chapter Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. Which program will combine Brothers.One and Brothers.Two to produce Brothers.Three?

    Chapter Quiz
    1. data brothers.three;
         set brothers.one;
         set brothers.two;
      run;
    2. data brothers.three;
         set brothers.one brothers.two;
      run;
    3. data brothers.three;
         set brothers.one brothers.two;
         by varx;
      run;
    4. data brothers.three;
         merge brothers.one brothers.two;
         by varx;
      run;
  2. Which program will combine Actors.Props1 and Actors.Props2 to produce Actors.Props3?

    Chapter Quiz
    1. data actors.props3;
         set actors.props1;
         set actors.props2;
      run;
    2. data actors.props3;
         set actors.props1 actors.props2;
      run;
    3. data actors.props3;
         set actors.props1 actors.props2;
         by actor;
      run;
    4. data actors.props3;
         merge actors.props1 actors.props2;
         by actor;
      run;
  3. If you submit the following program, which new data set is created?

    Chapter Quiz
    data work.jobsatis;
       set work.dataone work.datatwo;
    run;
    1. Chapter Quiz
    2. Chapter Quiz
    3. Chapter Quiz
    4. none of the above

  4. If you concatenate the data sets below in the order shown, what is the value of Sale in observation 2 of the new data set?

    Chapter Quiz
    1. missing

    2. $30,000

    3. $40,000

    4. you cannot concatenate these data sets

  5. What happens if you merge the following data sets by the variable SSN?

    Chapter Quiz
    1. The values of Age in the 1st data set overwrite the values of Age in the 2nd data set.

    2. The values of Age in the 2nd data set overwrite the values of Age in the 1st data set.

    3. The DATA step fails because the two data sets contain same-named variables that have different values.

    4. The values of Age in the 2nd data set are set to missing.

  6. Suppose you merge data sets Health.Set1 and Health.Set2 below:

    Chapter Quiz

    Which output does the following program create?

    data work.merged;
       merge health.set1(in=in1) health.set2(in=in2);
       by id;
       if in1 and in2;
    run;
    proc print data=work.merged;
    run;
    1. Chapter Quiz
    2. Chapter Quiz
    3. Chapter Quiz
    4. none of the above

  7. The data sets Ensemble.Spring and Ensemble.Sum both contain a variable named Blue. How do you prevent the values of the variable Blue from being overwritten when you merge the two data sets?

    1. data ensemble.merged;
         merge ensemble.spring(in=blue)
               ensemble.summer;
         by fabric;
      run;
    2. data ensemble.merged;
         merge ensemble.spring(out=blue)
               ensemble.summer;
         by fabric;
      run;
    3. data ensemble.merged;
         merge ensemble.spring(blue=navy)
               ensemble.summer;
         by fabric;
      run;
    4. data ensemble.merged;
         merge ensemble.spring(rename=(blue=navy))
               ensemble.summer;
         by fabric;
      run;
  8. What happens if you submit the following program to merge Blood.Donors1 and Blood.Donors2, shown below?

    data work.merged;
       merge blood.donors1 blood.donors2;
       by id;
    run
    Chapter Quiz
    1. The Merged data set contains some missing values because not all observations have matching observations in the other data set.

    2. The Merged data set contains eight observations.

    3. The DATA step produces errors.

    4. Values for Units in Blood.Donors2 overwrite values of Units in Blood.Donors1.

  9. If you merge Company.Staff1 and Company.Staff2 below by ID, how many observations does the new data set contain?

    Chapter Quiz
    1. 4

    2. 5

    3. 6

    4. 9

  10. If you merge data sets Sales.Reps, Sales.Close, and Sales.Bonus by ID, what is the value of Bonus in the third observation in the new data set?

    Chapter Quiz
    1. $4,000

    2. $3,000

    3. missing

    4. can't tell from the information given

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset