Sometimes you might need to combine
data from three or more related SAS data sets in order to create one
new data set. For example, the three data sets listed below all contain
different data about a fictional airline's flights and airports.
Sasuser.Acities contains data about various airports, Sasuser.Revenue
contains data about the revenue generated by various flights, and
Sasuser.Expenses contains data about the expenses incurred by various
flights. The variables in each of these data sets are listed here.
Suppose you want to
create a new data set, named Sasuser.Alldata, that contains data from
each of these three input data sets. As shown below, the Sasuser.Alldata
data set contains the new variable Profit, which is calculated from
the revenue values that are stored in Sasuser.Revenue and the expense
values that are stored in Sasuser.Expenses.
You can specify any
number of input data sets in the MERGE statement as long as all input
data sets have a common BY variable. However, you can see from the
data set variable lists above that these three data sets do not have
one common variable. We will consider a method for performing a match-merge
on these three data sets.
Although the three data
sets Sasuser.Acities, Sasuser.Revenue,
and Sasuser.Expenses do not have a common BY variable, there are several
variables that are common to two of the three data sets. As shown
below, Date and FlightID are both common to Revenue and Expenses.
The variable Code in the Acities data set and the variable Dest in
the Revenue data set, while named differently, contain the same data
with the same type and length.
Notice that Code in
Acities and Dest in Revenue are listed as corresponding to one another
even though they have different names. When you are looking for common
variables between data sets, the variable names are not important
since they can be changed with the RENAME= option in the MERGE statement.
Instead, you should look for variables that record the same information
and that have the same type in each input data set. Common variables
do not need to have the same length, although you should remember
that the length of the variable in the first-listed data set will
determine the length of the variable in the output data set.
Note: Any variables that have the
same name in multiple data sets in the MERGE statement must also have
the same type. If any variables in different input data sets have
identical names but do not have identical types, ERROR and WARNING
messages are written to the SAS log, and the match-merge fails.
In this case, both Code
in Acities and Dest in Revenue record the three-letter abbreviation
of an airport.
Tip
You can use PROC CONTENTS
to view information about variables such as type, length, and description.
Since there are variables
that are common to two different pairs of the three data sets shown
above, you can combine these data sets into one data set by using
the MERGE statement in two subsequent DATA steps. That is, you can
perform one match-merge on two of the data sets to create one new
data set that combines data from both. Then you can perform another
match-merge on the new data set and the remaining original data set.
Consider the following example.