Chapter 14
Combining Data Vertically

Overview

Introduction

Objectives

Using a FILENAME Statement

Overview

Example

Using an INFILE Statement

Overview

Example

Assigning the Names of the Files to Be Read

Example

Using the COMPRESS Function

Example

Using the END= Option

Example

Using Date Functions

Example

Using the INTNX Function

Example

Appending SAS Data Sets

Overview

Example

Using the FORCE Option

Example

Appending Variables with Different Lengths

Example

Appending Variables with Different Types

Example

Additional Features

Storing Raw Data Filenames in a SAS Data Set

Storing Raw Data Filenames in an External File

Summary

Contents

Text Summary

Syntax

Sample Programs

Points to Remember

Quiz

Overview

Introduction

Combining data vertically refers to the process of concatenating or interleaving data. In some cases the data might be in SAS data sets. In other cases the data might be stored in raw data files.

In this chapter you learn how to create a SAS data set by concatenating multiple raw data files using the FILENAME and INFILE statements. You also learn how to concatenate SAS data sets using PROC APPEND.

Image

Objectives

In this chapter you learn to

• create a SAS data set from multiple raw data files using a FILENAME statement

• create a SAS data set from multiple raw data files using an INFILE statement with the FILEVAR= option

• append SAS data sets using the APPEND procedure.

Using a FILENAME Statement

Overview

You already know that you can use a FILENAME statement to associate a fileref with a single raw data file. You can also use a FILENAME statement to concatenate raw data files by assigning a single fileref to the raw data files that you want to combine.

General form, FILENAME statement:

FILENAME fileref('external-file' 'external-file2' ...'external-filen'),

where

fileref

is any SAS name that is eight characters or fewer.

external-file

is the physical name of an external file. The physical name is the name that is recognized by the operating environment.

CAUTION:
All of the file specifications must be enclosed in one set of parentheses.

When the fileref is specified in an INFILE statement, each raw data file that has been referenced can be sequentially read into a data set using an INPUT statement.

TIP If you are not familiar with the content and structure of your raw data files, you can use PROC FSLIST to view them.

Example

In the following program, the FILENAME statement creates the fileref Qtr 1, which references the raw data files Monthl.dat, Month2.dat, andMonth3.dat. The files are stored in the C:Sasuser directory in the Windows operating environment. In the DATA step, the INFILE statement identifies the fileref, and the INPUT statement describes the data, just as if Qtr 1 referenced a single raw data file.

filename qtr1 ('c:sasusermonth1.dat''c:sasusermonth2.dat'
               'c:sasusermonth3.dat'),
data work.firstqtr;
   infile qtr1;
   input Flight $ Origin $ Dest $
         Date : date9. RevCargo : comma15.2;
run;

Table 14.1 RAW Data File Month1.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01JAN2000 $191,187.00
IA10201 SYD HKG 01JAN2000 $169,653.00
IA10300 SYD CBR 01JAN2000 $850.00
IA10301 SYD CBR 01JAN2000 $970.00
IA10302 SYD CBR 01JAN2000 $1,030.00

Table 14.2 Raw Data File Month2.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01FEB2000 $177,801.00
IA10201 SYD HKG 01FEB2000 $174,891.00
IA10300 SYD CBR 01FEB2000 $1,070.00
IA10301 SYD CBR 01FEB2000 $1,310.00
IA10302 SYD CBR 01FEB2000 $850.00

Table 14.3 Raw Data File Month3.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01MAR2000 $181,293.00
IA10201 SYD HKG 01MAR2000 $173,727.00
IA10300 SYD CBR 01MAR2000 $1,150.00
IA10301 SYD CBR 01MAR2000 $910.00
IA10302 SYD CBR 01MAR2000 $1,170.00

The SAS log indicates that the raw data files referenced by Qtr 1 are sequentially read into the SAS data set Work.FirstQtr.

Table 14.4 SAS Log

9    filename qtr1 ('c:sasusermonth1.dat''c:sasusermonth2.dat'
10                  'c:sasusermonth3.dat'),

11   data work.firstqtr;
12      infile qtr1;
13      input Flight $ Origin $ Dest $
14      Date : date9. RevCargo : comma15.2;
15   run;

NOTE: The infile QTR1 is:
      File Name=c:sasusermonth1.dat,
      File List=('c:sasusermonth1.dat' 'c:sasusermonth2.dat'
                 'c:sasusermonth3.dat'),
      RECFM=V,LRECL=256

NOTE: The infile QTR1 is:
      File Name=c:sasusermonth2.dat,
      File List=('c:sasusermonth1.dat' 'c:sasusermonth2.dat'
                 'c:sasusermonth3.dat'),
      RECFM=V,LRECL=256

NOTE: The infile QTR1 is:
      File Name=c:sasusermonth3.dat,
      File List=('c:sasusermonth1.dat' 'c:sasusermonth2.dat'
                 'c:sasusermonth3.dat'),
      RECFM=V,LRECL=256

NOTE: 50 records were read from the infile QTR1.
      The minimum record length was 33.
      The maximum record length was 37.
NOTE: 50 records were read from the infile QTR1.
      The minimum record length was 33.
      The maximum record length was 37.
NOTE: 50 records were read from the infile QTR1.
      The minimum record length was 33.
      The maximum record length was 37.
NOTE: The data set WORK.FIRSTQTR has 150 observations
      and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           4.02 seconds
      cpu time            0.93 seconds

The following PROC PRINT output shows a portion of the observations in the Work.FirstQtr data set.

proc print
     data=work.firstqtr (firstobs=45 obs=55);
     format date date9.
            revcargo dollar11.2;
run;

Image

Using an INFILE Statement

Overview

You can make the process of concatenating raw data files more flexible by using an INFILE statement with the FILEVAR= option. The FILEVAR= option enables you to dynamically change the currently opened input file to a new input file.

General form, INFILE statement with the FILEVAR= option:

INFILE file-specification FILEVAR= variable;

where

FILEVAR= variable

names a variable whose change in value causes the INFILE statement to close the current input file and open a new input file.

variable

contains a character string that is a physical filename.

When you use an INFILE statement with the FILEVAR= option, the file specification is a placeholder, not an actual filename or a fileref that had been assigned previously to a file. SAS uses this placeholder for reporting processing information to the SAS log. The file specification must conform to the same rules as a fileref.

When the INFILE statement executes, it reads from the file that the FILEVAR= variable specifies. Like automatic variables, this variable is not written to the data set.

Example

Suppose you want to create a SAS data set that contains three months of data stored in three raw data files. The three months are the current month and the previous two months.

Image

In the following INFILE statement, temp is an arbitrarily named placeholder, not an actual filename or a fileref that had been assigned to a file previously. The FILEVAR= variable nextfile contains the name of the raw data file to be read (for example, Month9.dat, Monthl0.dat, or Month11.dat). A RUN statement is not included because the program is not complete.

data work.quarter;
   infile temp filevar=nextfile;
   input Flight $ Origin $ Dest $
         Date : date9. RevCargo : comma15.2;

Table 14.5 Raw Data File Month9.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01SEP2000 $189,441.00
IA10201 SYD HKG 01SEP2000 $175,473.00
IA10300 SYD CBR 01SEP2000 $1,370.00
IA10301 SYD CBR 01SEP2000 $710.00
IA10302 SYD CBR 01SEP2000 $1,210.00

Table 14.6 Raw Data File Month10.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01OCT2000 $182,457.00
IA10201 SYD HKG 01OCT2000 $160,923.00
IA10300 SYD CBR 01OCT2000 $1,030.00
IA10301 SYD CBR 01OCT2000 $870.00
IA10302 SYD CBR 01OCT2000 $770.00

Table 14.7 Raw Data File Month 11.dat (first five records)

----+----10---+----20---+----30---+----40
IA10200 SYD HKG 01NOV2000 $176,637.00
IA10201 SYD HKG 01NOV2000 $164,997.00
IA10300 SYD CBR 01NOV2000 $1,230.00
IA10301 SYD CBR 01NOV2000 $1,230.00
IA10302 SYD CBR 01NOV2000 $790.00

Note: You can also use multiple INFILE statements or operating system techniques to combine raw data files. However, this chapter discusses only the FILENAME statement and the INFILE statement with the FILEVAR= option.

Assigning the Names of the Files to Be Read

The next step is to assign the names of the three files to be read to the variable nextfile:

data work.quarter;
   infile temp filevar=nextfile;
   input Flight $ Origin $ Dest $
         Date : date9. RevCargo : comma15.2;

In this case, use the raw data files Month9.dat, Monthl0.dat, and Month11.dat. Notice that the titles of the raw data files are very similar. They each start with “Month” and are followed by numeric characters and the file extension .dat:

• Month9.dat

• Monthl0.dat

• Month11.dat

You can use an iterative DO loop and the PUT function to automatically change the values assigned to nextfile.

Example

In the following code, the DO statement creates the index variable i and assigns it the values of 9, 10, and 11. The assignment statement then assigns the name of the raw data file to nextfile using the current value of i and the PUT function.

Month9.dat, Monthl0.dat, and Month11.dat are stored in the C:Sasuser directory in the Windows operating environment. On the right side of the assignment statement, the text string c:sasusermonth is concatenated with the current value of i using the double exclamation point (!!) concatenation operator, c:sasusermonth is then concatenated with the text string .dat.

data work.quarter;
   do i = 9, 10, 11;
      nextfile="c:sasusermonth"
               !!put(i,2.)!!".dat";
      infile temp filevar=nextfile;
      input Flight $ Origin $ Dest $
            Date : date9. RevCargo : comma15.2;
   end;

The following table shows the value of nextfile as the value of i changes.

Image

TIP Depending on the characters available on your keyboard, the symbol you use as the concatenation operator can be a double vertical bar (), a double broken vertical bar (Image), or a double exclamation point (!!).

Using the COMPRESS Function

Note the space between month and 9 in c:sasusermonth 9.dat. You can eliminate the space by using the COMPRESS function.

Image

General form, COMPRESS function:

COMPRESS(source, <characters-to-remove>);

where

source

specifies a source string that contains the characters to remove.

characters-to-remove

specifies the character or characters that SAS removes from the source string.

Note: If the characters-to-remove is omitted, the COMPRESS function removes blank spaces from the source.

Example

In the following code, the COMPRESS function removes blank spaces from the value of nextfile:

data work.quarter;
   do i = 9, 10, 11;
      nextfile="c:sasusermonth"!!put(i,2.)!!".dat";
      nextfile=compress (nextfile,' '),
      infile temp filevar=nextfile;
      input Flight $ Origin $ Dest $
            Date : date9. RevCargo : comma15.2;
      end;

The COMPRESS function can be combined with the assignment statement for greater efficiency:

data work.quarter;
   do i = 9, 10, 11;
      nextfile="c:sasusermonth"!!compress(put(i,2.)!!".dat",' '),
      infile temp filevar=nextfile;
      input Flight $ Origin $ Dest $
            Date : date9. RevCargo : comma15.2;
      end;

With the addition of the COMPRESS function, when the value of i equals 9, nextfile is assigned the correct value, c:sasusermonth9.dat.

Image

You can add more statements to the program. An OUTPUT statement within the DO loop outputs each record to the SAS data set Work.Quarter and a STOP statement prevents an infinite loop of the DATA step.

data work.quarter;
   do i = 9, 10, 11;
      nextfile="c:sasusermonth"
               !!compress(put(i,2.)!!".dat",' '),
      infile temp filevar=nextfile;
      input Flight $ Origin $ Dest $ Date : date9.
            RevCargo : comma15.2;
     output;
   end;
   stop;

The program is almost complete. However, there are several other statements that need to be added in order to read all of the input data.

Using the END= Option

When you read past the last record in a raw data file, the DATA step normally stops processing. In this case, you need to read the last record in the first two raw data files. However, you do not want to read past the last record in either of those files because the DATA step will stop processing. You can use the END= option with the INFILE statement to determine when you are reading the last record in the last raw data file.

General form, INFILE statement with the END= option:

INFILE file-specification END=variable;

where

variable

names a variable that SAS sets to

0 when the current input data record is not the last record in the input file

1 when the current input record is the last record in the input file.

Note: Like automatic variables, the END= variable is not written to the SAS data set.

The END= option enables you to name a variable whose value is controlled by SAS. The value of the variable is 0 when you are not reading the last record in an input file and 1 when you are reading the last record in an input file. You can test the value of the END= variable to determine whether the DATA step should continue processing.

Example

The END= variable lastobs is created in the INFILE statement. The DO UNTIL statement conditionally executes until the value of lastobs equals 1. A RUN statement completes the program.

data work.quarter;
   do i = 9, 10, 11;
   nextfile="c:sasusermonth"
            !!compress(put(i,2.)!!".dat",' '),
         do until (lastobs);
         infile temp filevar=nextfile end=lastobs;
         input Flight $ Origin $ Dest $ Date : date9.
               RevCargo : comma15.2;
         output;
      end;
   end;
   stop;
run;

PROC PRINT output shows a portion of the observations in the SAS data set Work.Quarter. A LABEL statement is used to assign the descriptive label Month to the variable i. Notice that the variables nextfile and lastobs are not written to the data set.

proc print
     data=work.quarter
     (firstobs=45 obs=55) label;
   label i='Month';
   format date date9.
          revcargo dollar11.2;
run;

Image

Using Date Functions

You can make your program more flexible by eliminating the need to include explicit month numbers in your SAS statements. To create a program that will always read the current month and the previous two months, you can use date functions to obtain the month number of today's date to begin the quarter.

Example

In the following program, the MONTH and TODAY functions are used to obtain the value of the variable monthnum. The TODAY function returns the current date from the system clock as a SAS date value. The month number is then extracted from the current date using the MONTH function.

The value of midmon is calculated by subtracting 1 from the value of monthnum. The value of lastmon is then calculated by subtracting 1 from the values of midmon. The following table shows the values monthnum, midmon, and lastmon if the current date is October 22, 2002.

Image

In the previous example, the DO statement created the index variable i and assigned it the values of 9, 10, and 11. Here, the DO statement assigns i the values of monthnum, midmon, and lastmon:

data work.quarter (drop=monthnum midmon lastmon);
   monthnum=month(today());
   midmon=monthnum-1;
   lastmon=midmon-1;
   do i = monthnum, midmon, lastmon;
      nextfile="c:sasusermonth"
               !!compress(put(i,2.)!!".dat",' '),
      do until (lastobs);
         infile temp filevar=nextfile end=lastobs;
         input Flight $ Origin $ Dest $ Date : date9.
               RevCargo : comma15.2;
         output;
      end;
   end;
   stop;
run;

The following PROC PRINT output shows a portion of the observations in Work.Quarter.

proc print data=work.quarter
           (firstobs=45 obs=55) label;
   label i='Month';
   format date date9.
          revcargo dollar11.2;
run;

Image

Using the INTNX Function

In the previous example the current month was October. What happens if the current month is January or February?

Suppose the current date is February 16, 2003. Using the following program, the values for midmon (January) and lastmon (December) would be 7 and 0 respectively. Since there is no “0” month, the program would fail to read the third raw data file.

data work.quarter (drop=monthnum midmon lastmon);
   monthnum=month(today());
   midmon=monthnum-1;
   lastmon=midmon-1;
   do i = monthnum, midmon, lastmon;
      nextfile="c:sasusermonth"
               !!compress(put(i,2.)!!".dat",' '),
      do until (lastobs);
         infile temp filevar=nextfile end=lastobs;
         input Flight $ Origin $ Dest $ Date : date9.
               RevCargo : comma15.2;
         output;
      end;
   end;
   stop;
run;

Image

You can use the INTNX function with the TODAY and MONTH functions to correctly determine the values of midmon and lastmon for any given date. Remember that the INTNX function increments a date, time, or datetime value by a given interval or intervals, and returns a date, time, or datetime value.

Example

Suppose the current date is January 30, 2003. In the following program monthnum is assigned a value of 7 using the TODAY and MONTH functions. The INTNX function is used with the TODAY and MONTH functions to assign a value of 12 to midmon and a value of 11 to lastmon.

data work.quarter (drop=monthnum midmon lastmon);
   monthnum=month(today());
   midmon=month(intnx('month',today(),-1));
   lastmon=month(intnx('month',today(),-2));
   do i = monthnum, midmon, lastmon;
      nextfile="c:sasmonth"!!compress(put(i,2.)!!".dat",' '),
      do until (lastobs);
         infile temp filevar=nextfile end=lastobs;
         input Flight $ Origin $ Dest $ Date : date9.
               RevCargo : comma15.2;
         output;
      end;
   end;
   stop;
run;

Image

The following PROC PRINT output shows a portion of the observations in Work.Quarter.

proc print data=work.quarter
           (firstobs=45 obs=55) label;
   label i='Month';
   format date date9.
          revcargo dollar11.2;
run;

Image

Appending SAS Data Sets

Overview

Now that you have seen several methods for concatenating raw data files, we can consider how you can use the APPEND procedure to concatenate two SAS data sets.

General form, PROC APPEND:

PROC APPEND BASE=SAS-data-set DATA=SAS-data-set;

RUN;

where

BASE=SAS-data-set

names the data set to which you want to add observations.

DAT A=SAS-data-set

names the SAS data set containing observations that you want to append to the end of the SAS data set specified in the BASE= argument.

PROC APPEND reads only the data in the DATA= SAS data set, not the BASE= SAS data set. PROC APPEND concatenates data sets even though there might be variables in the BASE= data set that do not exist in the DATA= data set.

When the BASE= data set contains more variables than the DATA= data set, missing values for the additional variables are assigned to the observations that are read in from the DATA= data set and a warning message is written to the SAS log.

Example

The SAS data sets Work.Cap2001 and Work Capacity both contain the following variables: Caplst, CapBusiness, CapEcon, Dest, FlightID, Origin, and RouteID. However, the BASE= data set (Work.Cap2001) contains an additional variable, Date, that is not included in the DATA= data set (Work Capacity).

When the following program is submitted, the SAS log indicates that the variable Date was not found in the DATA= file.

proc append base=work.cap2001
            data=work.capacity;
run;

Table 14.8 SAS Log

NOTE: Appending WORK.CAPACITY to WORK.CAP2001.
WARNING: Variable Date was not found on DATA file.
NOTE: There were 50 observations read from the data set WORK.CAPACITY.
NOTE: 50 observations added.
NOTE: The data set WORK.CAP2001 has 100 observations and 8 variables.
NOTE: PROCEDURE APPEND used (Total process time):

PROC PRINT output of the appended version of Work.Cap2001 shows that missing values have been assigned to Date in the observations that were read in from the DATA= data set.

proc print data=work.cap2001
           (firstobs=45 obs=55);
run;

Image

Note: You can also use the DATA step SET statement to combine SAS data vertically. If multiple data set names appear in the SET statement, the resulting output data set is a concatenation of all the data sets listed. Unlike the APPEND procedure, the SET statement in the DATA step reads all observations in both input data sets in order to concatenate them. Therefore, the APPEND procedure is more efficient than the SET statement in the DATA step for concatenating data sets because it reads only the data in the DATA= data set.

In the following program, SAS reads all of the observations from Work.Cap2001, then all of the observations from Work Capacity.

data work.new;
   set work.cap2001 work.capacity;
run;

Note: You can also use the SQL procedure to combine SAS data vertically. For information about using the SQL procedure to combine data vertically, see “Combining Tables Vertically Using PROC SQL” on page 132.

Using the FORCE Option

In the previous example, the DATA= data set (Work.Capacity) contained fewer variables than the BASE= data set (Work.Cap2001). However, you might need to append data sets when the DATA= data set contains more variables than the BASE= data set.

You must use the FORCE option with the APPEND procedure to concatenate data sets when the DATA= data set contains variables that are not in the BASE= data set.

General form, PROC APPEND with the FORCE option:

PROC APPEND BASE=SAS-data-set DATA=SAS-data-set <FORCE>;

CAUTION:
Fhe FORCE option can cause loss of data due to truncation or dropping of variables.

When you use the FORCE option, the structure of the BASE= data set is used for the appended data set.

Example

Remember that the SAS data sets Work.Cap2001 and Work.Capacity both contain the following variables: Caplst, CapBusiness, CapEcon, Dest, FlightID, Origin, and RouteID. In this case, the DATA= data set (Work.Cap2001) contains an additional variable, Date, that is not included in the BASE= data set (Work.Capacity).

When the following program is submitted, the SAS log indicates that the data sets were not appended because the variable Date was not found in the BASE= file.

proc append base=work.capacity
            data=work.cap2001;
run;

Table 14.9 SAS Log

NOTE: Appending WORK.CAP2001 to WORK.CAPACITY.
WARNING: Variable Date was not found on BASE file.
ERROR: No appending done because of anomalies listed above.
        Use FORCE option to append these files.
NOTE: 0 observations added.
NOTE: The data set WORK.CAPACITY has 50 observations and 7 variables.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds
NOTE: The SAS System stopped processing this step because of errors.

When the FORCE option is used with PROC APPEND, the SAS log indicates that observations have been read from the DATA= data set, but that dropping or truncating will occur.

proc append base=work.capacity
            data=work.cap2001 force;
run;

Table 14.10 SAS Log

NOTE: Appending WORK.CAP2001 to WORK.CAPACITY.
WARNING: Variable Date was not found on BASE file.
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 50 observations read from the data set WORK.CAP2001.
NOTE: 50 observations added.
NOTE: The data set WORK.CAPACITY has 100 observations and 7 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

PROC PRINT output shows that the variable Date has been dropped from the appended version of Work.Capacity.

proc print
     data=work.capacity
         (firstobs=45 obs=55);
run;

Image

Appending Variables with Different Lengths

If the DATA= data set contains variables that are longer than the variables in the BASE= data set, the FORCE option must be used with PROC APPEND. Using the FORCE option enables you to append the data sets. However, the DATA= variable values will be truncated.

Example

In the SAS data set Work.Acities, the variable City has a length of 22. In the SAS data set Work.WestAust, City has a length of 50. You can use the CONTENTS procedure to view the attributes of the variables in each data set.

proc contents data=work.acities;
run;

Image

proc contents data=work.westaust;
run;

Image

When the following program is submitted, the SAS log indicates that the data sets were not appended due to different lengths for City in the BASE= and DATA= data sets.

proc append base=work.acities
            data=work.westaust;
run;

Table 14.11 SAS Log

NOTE: Appending WORK.WESTAUST to WORK.ACITIES.
WARNING: Variable City has different lengths on BASE and
         DATA files (BASE 22 DATA 50).
ERROR: No appending done because of anomalies listed above.
       Use FORCE option to append these files.
NOTE: 0 observations added.
NOTE: The data set WORK.ACITIES has 50 observations and 4 variables.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           1.44 seconds
      cpu time            0.06 seconds
NOTE: The SAS System stopped processing this step because of errors.

When the FORCE option is used, the SAS log indicates that the data sets are appended, but that dropping or truncating will occur.

proc append base=work.acities
            data=work.westaust force;
run;

Table 14.12 SAS Log

NOTE: Appending WORK.WESTAUST to WORK.ACITIES.
WARNING: Variable City has different lengths on BASE and DATA files
         (BASE 22 DATA 50).
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 50 observations read from the data set WORK.WESTAUST.
NOTE: 50 observations added.
NOTE: The data set WORK.ACITIES has 100 observations and 4 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           1.44 seconds
      cpu time            0.06 seconds

PROC CONTENTS output for the appended version of Work.Acities shows that the variable City has retained a length of 22 from the BASE= data set. Also notice that the variable Code has retained the label Start Point from the BASE= data set.

proc contents
     data=work.acities;
run;

Image

PROC PRINT output shows that some of the values of City are truncated in the appended version of Work.Acities.

proc print
     data=work.acities
           (firstobs=45 obs=55);
run;

Image

Appending Variables with Different Types

If the DATA= data set contains a variable that does not have the same type as the corresponding variable in the BASE= data set, the FORCE option must be used with PROC APPEND. Using the FORCE option enables you to append the data sets. However, missing values are assigned to the DATA= variable values for the variable whose type did not match.

Example

In the SAS data set Work.Allemps, the variable Phone is a character variable. In the SAS data set Work.Newemps, Phone is a numeric variable. You can use PROC CONTENTS to view the attributes of the variables in each data set.

proc contents data=work.allemps;
run;

Image

proc contents data=work.newemps;
run;

Image

When the following program is submitted, the SAS log indicates that there is a type mismatch for the variable Phone and that data sets were not appended.

proc append base=work.allemps
            data=work.newemps;
run;

Table 14.13 SAS Log

NOTE: Appending WORK.NEWEMPS to WORK.ALLEMPS.
WARNING: Variable Phone not appended because of type mismatch.
ERROR: No appending done because of anomalies listed above.
       Use FORCE option to append these files.
NOTE: 0 observations added.
NOTE: The data set WORK.ALLEMPS has 550 observations and 5 variables.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.08 seconds
      cpu time            0.01 seconds
NOTE: The SAS System stopped processing this step because of errors.

When the FORCE option is used, the SAS log indicates that the data sets are appended, but that the variable Phone is not appended due to the type mismatch.

proc append base=work.allemps
            data=work.newemps force;
run;

Table 14.14 SAS Log

NOTE: Appending WORK.NEWEMPS to WORK.ALLEMPS.
WARNING: Variable Phone not appended because of type mismatch.
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 19 observations read from the data set WORK.NEWEMPS.
NOTE: 19 observations added.
NOTE: The data set WORK.ALLEMPS has 569 observations and 5 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.05 seconds
      cpu time            0.02 seconds

PROC CONTENTS output for the appended version of Work.Allemps shows that the variable Phone has retained the type of character from the BASE= data set.

proc contents
     data=work.allemps;
run;

Image

PROC PRINT output of the appended version of Work.Allemps shows that the the values for Phone are missing in the records that were read in from the DATA= data set.

proc print
     data=work.allemps
           (firstobs=45 obs=55);
run;

Image

Additional Features

In addition to the methods for appending raw data files that were discussed earlier in this chapter, you can also append raw data files using a SAS data set or an external file that contains the names of the raw data files to be appended.

Storing Raw Data Filenames in a SAS Data Set

In the following program, five raw data files, Route1.dat, Route2.dat, Route3.dat, Route4.dat, and Route5.dat, are concatenated to create the SAS data set Work.NewRoute. The names of the raw data files are stored in the SAS data set Sasuser.Rawdata, which is referenced using a SET statement. The name of the FILEVAR= variable, readit, is the name of the variable in Sasuser.Rawdata whose value is the name of the file to be read.

data work.newroute;
  set sasuser.rawdata;
  infile in filevar = readit end = lastfile;
  do while(lastfile = 0);
     input @1 RouteID $7. @8 Origin $3. @11 Dest $3.
           @14 Distance 5. @19 Fare1st 4.
           @23 FareBusiness 4. @27 FareEcon 4.
           @31 FareCargo 5.;
     output;
  end;
run;

Image

Storing Raw Data Filenames in an External File

In the following program, Route1.dat, Route2.dat, Route3.dat, Route4.dat, and Route5.dat are also concatenated to create the SAS data set Work.NewRoute. In this example, the names of the raw data files are stored in the external file Rawdatafiles.dat, which is referenced in the first INFILE statement. The name of the FILEVAR= variable, readit, is the name of the variable read from Rawdatafiles.dat. The value of readit is the name of the raw data file to be read.

Table 14.15 Raw Data File Rawdatafiles.dat

1---+----10---+----20
route1.dat
route2.dat
route3.dat
route4.dat
route5.dat

data work.newroute;
   infile 'rawdatafiles.dat';
   input readit $10.;
   infile in filevar=readit end=lastfile;
   do while(lastfile = 0);
      input   @1 RouteID $7. @8 Origin $3. @11 Dest $3.
              @14 Distance 5. @19 Fare1st 4.
              @23 FareBusiness 4. @27 FareEcon 4.
              @31 FareCargo 5.;
      output;
   end;
run;

Summary

Contents

This section contains the following topics.

• “Text Summary” on page 523

• “Syntax” on page 524

• “Sample Programs” on page 525

• “Points to Remember” on page 526

Text Summary

Using a FILENAME Statement

You can use a FILENAME statement to concatenate raw data files by assigning a single fileref to the raw data files that you want to combine. When the fileref is specified in an INFILE statement, each raw data file that has been referenced can be sequentially read into a data set using an INPUT statement.

Using an INFILE Statement

You can make the process of concatenating raw data files more flexible by using an INFILE statement with the FILEVAR= option. The FILEVAR= option enables you to dynamically change the currently opened input file to a new input file. When the INFILE statement executes, it reads from the file that the FILEVAR= variable specifies.

In some cases, you might need to use the COMPRESS function to eliminate spaces in the filenames you generate.

When you read the last record in a raw data file, the DATA step normally stops processing. When you are concatenating raw data files, you do not want to read past the last record until you reach the end of the last input file. You can determine whether you are reading the last record in the last raw data file by using the END= option with the INFILE statement. You can then test the value of the END= variable to determine whether the DATA step should continue processing.

If you are working with date-related data, you might be able to make your program more flexible by eliminating the need to include explicit month numbers in your SAS statements. To create a program that will always read the current month and the previous two months, you can use the MONTH and TODAY functions to obtain the month number of today's date to begin the quarter. In some cases, you might need to use the INTNX function with the TODAY and MONTH functions to correctly determine the month numbers.

Appending SAS Data Sets

You can use PROC APPEND to concatenate two SAS data sets. PROC APPEND reads only the data in the DATA= SAS data set, not in the BASE= SAS data set. PROC APPEND concatenates data sets even though there might be variables in the BASE= data set that do not exist in the DATA= data set.

The FORCE option must be used if the DATA= data set contains variables that

• are not in the BASE= data set

• are longer than the variables in the BASE= data set

• do not have the same type as the variables in the BASE= data set.

The FORCE option can cause loss of data due to truncation or dropping of variables. The following table summarizes the consequences of using the FORCE option.

DATA= data set contains variables that...

FORCE required?

Consequences of using the FORCE option

are in the BASE=data set, but the BASE= data set has more variables

no

Missing values are assigned to the extra BASE= data set variables.

are not in the BASE= data set

yes

Extra DATA= data set variables are dropped.

are longer than the variables in the BASE= data set

yes

DATA= data set variable values are truncated.

do not have the same type as the variables in the BASE= data set

yes

Missing values are assigned to the DATA= data set variables with the data type mismatch.

Additional Features

You can also append raw data files using a SAS data set or an external file that contains the names of the raw data files to be appended.

Syntax

Combining Raw Data Files Using a FILENAME Statement

FILENAME fileref ('external-file 1 “external-file2' 'external-filen'),

DATA SAS-data-set;

INFILE file-specification;

INPUT variable <$> <&|:> <informat>;

RUN;

Combining Raw Data Files Using an INFILE Statement

DATA SAS-data=set;

DO index-variable=variablel, variable2, variablen;

variable = “text-string” !!PUT(index-variable,format)!!“text-string”;

variable = COMPRESS (variable,' '),

DO UNTIL( variable);

INFILE file-specification FILEVAR=variable END=variable;

INPUT variable <$> <&|:> <informat>;

OUTPUT;

END;

END;

STOP;

RUN;

Combining SAS Data Sets Using PROC APPEND

PROC APPEND BASE=SAS-data-set DATA=SAS-data-set <FORCE>;

RUN;

Sample Programs

Combining Raw Data Files Using a FILENAME Statement

   filename qtr1 ('c:datamonth1.dat''c:datamonth2.dat'
                  'c:datamonth3.dat'),
   data work.firstqtr;
      infile qtr1;'
      input Flight $ Origin $ Dest $
            Date : date9. RevCargo : comma15.2;
   run;

Combining Raw Data Files Using an INFILE Statement

   data quarter (drop=monthnum midmon lastmon);
      monthnum=month(today());
      midmon=month(intnx('month',today(),-1));
      lastmon=month(intnx('month',today(),-2));
      do i = monthnum, midmon, lastmon;
         nextfile="c:sasusermonth"
                  !!compress(put(i,2.)!!".dat",' '),
         do until (lastobs);
            infile temp filevar=nextfile end=lastobs;
            input Flight $ Origin $ Dest $ Date : date9.
                  RevCargo : comma15.2;
            output;
         end;
      end;
      stop;
   run;

Combining SAS Data Sets Using PROC APPEND

proc append base=work.acities
            data=work.airports force;
run;

Points to Remember

• When you use an INFILE statement with the FILEVAR= option, the file specification is just a placeholder, not an actual filename or a fileref that has been previously assigned to a file.

• Like automatic variables, the FILEVAR= variable and the END= variable are not written to the data set.

• Using the FORCE option with PROC APPEND can cause loss of data due to truncation or dropping of variables.

• When you use the FORCE option, the structure of the BASE= data set is used for the appended data set.

Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. Which of the following statements associates the fileref OnSale with the raw data files London.dat, Paris.dat, and Zurich.dat? The files are stored in the C:RoutesNew directory in the Windows operating environment.

    1. filename onsale (c: outes ewlondon.dat,
         c: outes ewparis.dat,
         c: outes ewzurich.dat);

    2. filename onsale 'c: outes ewlondon.dat'
         'c: outes ewparis.dat'
         'c: outes ewzurich.dat';

    3. filename onsale ('c: outes ewlondon.dat'
         'c: outes ewparis.dat'
         'c: outes ewzurich.dat'),

    4. filename onsale 'c: outes ewlondon.dat
         c: outes ewparis.dat
         c: outes ewzurich.dat';

  2. Which of the following statements is true?

    1. The FILEVAR= option can be used to dynamically change the currently opened input file to a new physical file.

    2. The FILEVAR= variable is not written to the data set.

    3. The FILEVAR= variable must contain a character string that is a physical filename.

    4. all of the above

  3. Given the following program, which table correctly shows the corresponding values of the variables x and readfile?

    data work.revenue;
       do x = 8, 9, 10;
          readfile=compress("c:datamonth"
             !!put(x,2.)!!".dat",' '),
          do until (lastobs);
              infile temp filevar=readfile
                end=lastobs;
              input Date : date7. Location $
                    Sales : dollar10.2;
              output;
          end;
       end;
       stop;
    run;

    1. Image
    2. Image
    3. Image
    4. Image
  4. If the current date is March 30, 2003, which table correctly shows the corresponding values of the variables y1, y2, y3, and nextfile?

    data work.quarter (drop=monthnum midmon lastmon);
       y3=year(today());
       y2=y3-1;
       y1=y3-2;
       do i = y3, y2, y1;
          nextfile="c:dataY"!!put(i,4.)!!".dat";
          do until (lastobs);
             infile temp filevar=nextfile
               end=lastobs;
             input Flight $ Origin $ Dest $
                   Date : date9.;
             output;
          end;
       end;
       stop;
    run;

    1. Image
    2. Image
    3. Image
    4. Image
  5. What happens when SAS processes the last data record in an input file?

    1. The END= variable is set to 1.

    2. The END= variable is set to 0.

    3. The END= variable is set to the number of records in the input file.

    4. The END= variable is written to the SAS data set.

  6. Which program appends Work.London to Work.Flights?

    Data Set Description for Work.London

    Data Set Description for Work.Flights

    Image
    Image
    1. proc append base=work.london
                  data=work.flights;
      run;

    2. proc append data=work.london
                  base=work.flights;
      run;

    3. proc append data=work.london work.flights;
      run;

    4. proc append data=work.flights work.london;
      run;

  7. What happens when the following program is submitted?

    proc append base=staff.marketing
                data=staff.sales force;
    run;

    Data Set Description for Staff.Marketing

    Data Set Description for Staff.Sales

    Image
    Image
    1. The length of LastName is converted to 20 in Staff.Marketing.

    2. LastName is dropped from Staff.Marketing.

    3. Missing values are assigned to LastName observations that are read in from Staff.Sales.

    4. Some of the values of LastName might be truncated in the observations that are read in from Staff.Sales.

  8. Which program appends Work.April to Work.Y2003?

    Data Set Description for Work.Y2003

    Data Set Description for Work.April

    Image
    Image
    1. proc append base=work.y2003
                  data=work.april;
      run;

    2. proc append base=work.april
                  data=work.y2003 force;
      run;

    3. proc append base=work.y2003
                  data=work.april force;
      run;

    4. proc append base=work.april
                  data=work.y2003;
      run;

  9. What happens when the SAS data set Work.NewHires is appended to the SAS data set Work.Employees using PROC APPEND?

    Data Set Description for Work.Employees

    Data Set Description for Work.NewHires

    Image
    Image
    1. Missing values are assigned to Room for the observations that are read in from Work.NewHires.

    2. Missing values are assigned to Room for all of the observations in Work.Employees.

    3. Room is dropped from Work.Employees.

    4. The values of Name are truncated in the observations that are read in from Work.NewHires.

  10. You do not need to use the FORCE option with PROC APPEND when

    1. the DATA= data set contains variables that are not in the BASE= data set.

    2. the BASE= data set contains variables that are not in the DATA= data set.

    3. the variables in the DATA= data set are longer than the corresponding variables in the BASE= data set.

    4. the variables in the DATA= data set have a different type than the corresponding variables in the BASE= data set.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset