Chapter 20. Creating a Single Observation from Multiple Records

Overview

Introduction

Information for one observation can be spread out over several records. You can write multiple INPUT statements to read each record that comprises a single observation …

Introduction

…or,youcan write one INPUT statement that contains a line pointer control to specify the record(s) from which values are to be read.

Introduction

Objectives

In this chapter, you learn to

  • read multiple records sequentially and create a single observation

  • read multiple records non-sequentially and create a single observation.

Using Line Pointer Controls

You know that as SAS reads raw data values, it keeps track of its position with an input pointer. You have used column pointer controls and column specifications to determine the column placement of the input pointer.

Column Specifications

input Name $ 1-12 Age 15-16 Gender $ 18;

Column Pointer Controls

input Name $12. @15 Age 2. @18 Gender $1.;

But you can also position the input pointer on a specific record by using a line pointer control in the INPUT statement.

Using Line Pointer Controls

There are two types of line pointer controls.

  • The forward slash (/) specifies a line location that is relative to the current one.

  • The #n specifies the absolute number of the line to which you want to move the pointer.

First we'll look at the forward slash (/). Later in the chapter, you'll learn how to use the #n, and you will see how these two controls can be combined.

Reading Multiple Records Sequentially

The Forward Slash (/) Line Pointer Control

You use the forward slash (/) line pointer control to read multiple records sequentially. The / advances the input pointer to the next record. The / line pointer control only moves the input pointer forward and must be specified after the instructions for reading the values in the current record.

The single INPUT statement below reads the values for Lname and Fname in the first record, followed by the values for Department and JobCode in the second record. Then the value for Salary is read in the third record.

The Forward Slash (/) Line Pointer Control

Using the / Line Pointer Control

Take a closer look at using the forward slash (/) line pointer control in the following example.

The raw data file Memdata contains the mailing list of a professional organization. Your task is to combine the information for each member into one data set observation. We'll begin by reading each member's name, followed by the street address, and finally the city, state, and zip code.

  • As you write the instructions to read the values for Fname and Lname, notice that not all of the values for Lname begin in the same column. So, you should use standard list input to read these values.

    Using the / Line Pointer Control
  • Now you want to read the values for Address from the second record. The / line pointer control advances the input pointer to the next record. At this point the INPUT statement is incomplete, so you should not place a semicolon after the line pointer control.

    Using the / Line Pointer Control
  • You can use column input to read the values in the next record as one variable named Address. Then add a line pointer control to move the input pointer to the next record.

    Using the / Line Pointer Control
  • As you write the statements to read the values for City, notice that some of the values are longer than eight characters and contain embedded blanks. Also note that each value is followed by two consecutive blanks. To read these values, you should use modified list input with the ampersand (&) modifier.

    The values for State and the values for Zip do not begin in the same column. Therefore, you should use list input to read these values.

    Using the / Line Pointer Control

Sequential Processing of Multiple Records in the DATA Step

Now that you've learned the basics of using the / line pointer control, let's take a closer look at the sequential processing of multiple records in the DATA step.

The values in the first record are read, and the / line pointer control moves the input pointer to the second record.

Sequential Processing of Multiple Records in the DATA Step

The values for Address are read, and the second / line pointer control advances the input pointer to the third record.

Sequential Processing of Multiple Records in the DATA Step

The values for City, State, and Zip are read, and the INPUT statement is completely executed.

Sequential Processing of Multiple Records in the DATA Step

The values in the program data vector are written to the data set as the first observation.

Sequential Processing of Multiple Records in the DATA Step

Control returns to the top of the DATA step, and the variable values are reinitialized to missing.

Sequential Processing of Multiple Records in the DATA Step

During the second iteration, values for Fname and Lname are read beginning in column 1 of the fourth record.

Sequential Processing of Multiple Records in the DATA Step

The values for Address are read and the / line pointer control advances the input pointer to the fifth record.

Sequential Processing of Multiple Records in the DATA Step

The values for City, State, and Zip are read, and the INPUT statement is completely executed.

Sequential Processing of Multiple Records in the DATA Step

The values in the program data vector are written to the data set as the second observation.

Sequential Processing of Multiple Records in the DATA Step

After the data set is complete, PROC PRINT output for Perm.Members shows that a single observation contains the complete information for each member.

proc print data=perm.members;
run;
Sequential Processing of Multiple Records in the DATA Step

Number of Records Per Observation

Note that the raw data file must contain the same number of records for each observation that is being created.

For example, suppose that there are only two records for the second member. However, the INPUT statement is set up to read three records.

Number of Records Per Observation

The second member's name and address are read and assigned to corresponding variables. Then the input pointer advances to the next record, as directed by the INPUT statement, and the third member's name is read as a value for City.

The DATA step is still looking for a value for State and Zip, so the input pointer advances to the next record and reads the values for the member's address.

The PROC PRINT output for this data set illustrates the problem.

Number of Records Per Observation

So, before you write the INPUT statement, make sure the raw data file contains the same number of records for each observation. In this raw data file there are now three records for each observation.

Number of Records Per Observation
Number of Records Per Observation

Reading Multiple Records Non-Sequentially

The #n Line Pointer Control

You already know how to read multiple records sequentially by using the / line pointer control. Now let's look at reading multiple records non-sequentially by using the #n line pointer control.

The #n specifies the absolute number of the line to which you want to move the input pointer. The #n pointer control can read records in any order; therefore, it must be specified before the instructions for reading values in a specific record.

The INPUT statement below first reads the values for Department and JobCode in the second record, then the values for Lname and Fname in the first record. Finally, it reads the value for Salary in the third record.

The #n Line Pointer Control

Using the #n Line Pointer Control

Take a closer look at using the #n line pointer control in the following example.

The raw data file Patdata contains information about the patients of a small group of general surgeons.

The first three records contain a patient's name, address, city, state, and zip code. The fourth record contains the patient's ID number followed by the name of the primary physician.

Using the #n Line Pointer Control

Suppose you want to read each patient's information in the following order:

  1. ID number (ID)

  2. first name (Fname)

  3. last name (Lname)

  4. address (Address)

  5. city (City)

  6. state (State)

  7. zip (Zip)

  8. doctor (Doctor)

  • To read the values for ID in the 4th record, specify #4 before naming the variable and defining its attributes.

    Using the #n Line Pointer Control
  • To read the values for Fname and Lname in the first record, specify #1 before naming the variables and defining their attributes.

    Using the #n Line Pointer Control
  • Use the #n line pointer control to move the input pointer to the second record and read the value for Address.

    Using the #n Line Pointer Control
  • Now move the input pointer to the third record and read the values for City, State, and Zip, in that order.

    Using the #n Line Pointer Control
    Using the #n Line Pointer Control
  • Now you need to move the input pointer down to the fourth record to read the values for Doctor, which begin in column 7. Don't forget to add a semicolon at the end of the INPUT statement. A RUN statement completes the program.

    Using the #n Line Pointer Control

Execution of the DATA Step

The #n pointer controls in the program below cause four records to be read for each execution of the DATA step.

data perm.patients;
   infile patdata;
   input #4 ID $5.
         #1 Fname $ Lname $
         #2 Address $23.
         #3 City $ State $ Zip $
         #4 @7 Doctor $6.;
run;

The first time the DATA step executes, the first four records are read, and an observation is written to the data set.

Execution of the DATA Step

During the second iteration, the next four records are read, and the second observation is written to the data set, and so on.

Execution of the DATA Step

The PROC PRINT output of the data set shows how information that was spread over several records has been condensed into one observation.

proc print data=perm.patients noobs;
run;
Execution of the DATA Step

Combining Line Pointer Controls

The forward slash (/) line pointer control and the #n line pointer control can be used together in a SAS program to read multiple records both sequentially and non-sequentially.

For example, you could use both the / line pointer control and the #n line pointer control to read the variables in the raw data file Patdata in the following order:

1. ID            5. City 
2. Fname         6. State 
3. Lname         7. Zip 
4. Address       8. Doctor
Combining Line Pointer Controls
  • To read the values for ID in the fourth record, specify #4 before naming the variable and defining its attributes.

  • Specify #1 to move the input pointer back to the first record, where the values for Fname and Lname are read.

  • Because the next record to be read is sequential, you can use the / line pointer control after the variable Lname to move the input pointer to the second record, where the value for Address is read.

  • The / line pointer control in the next line directs the input pointer to the third record, where the values for City, State and Zip are read.

  • The final / line pointer control moves the input pointer back to the fourth record, where the value for Doctor is read.

Combining Line Pointer Controls
Combining Line Pointer Controls

Chapter Summary

Text Summary

Multiple Records per Observation

Information for one observation can be spread out over several records. You can write one INPUT statement that contains line pointer controls to specify the record(s) from which values are read.

Reading Multiple Records Sequentially

The forward slash (/) line pointer control is used to read multiple records sequentially. Each time a / pointer is encountered, the input pointer advances to the next line.

Reading Multiple Records Non-Sequentially

The #n line pointer control is used to read multiple records non-sequentially. The #n specifies the absolute number of the line to which you want to move the pointer.

Combining Line Pointer Controls

The / line pointer control and the #n line pointer control can be combined within a SAS program to read multiple records both sequentially and non-sequentially.

Syntax

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set;

    INFILE file-specification;

    INPUT #n variable …

          #n @n variable … /

          variable … /

          variable … ;

RUN;

PROC PRINT DATA=SAS-data-set;

RUN;

Sample Program

libname perm 'c:
ecordsempdata';
filename personel 'c:
ecordsempdata
ew.dat';
data perm.emplist3;
   infile personel;
   input #2 Department $ 5-16
         #1 @16 ID $4. @1 Name $14. /
            JobCode 3. /
            Salary comma9.;
run;
proc print data=perm.emplist3;
run;

Points to Remember

  • When a file contains multiple records per observation, the file must contain the same number of records for each observation that is being created.

  • Because the / pointer control can only move forward, the pointer control is specified after the values in the current record are read.

  • The #n pointer control can read records in any order and must be specified before the variable names are defined.

  • A semicolon should be placed at the end of the complete INPUT statement.

Chapter Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. You can position the input pointer on a specific record by using

    1. column pointer controls.

    2. column specifications.

    3. line pointer controls.

    4. line hold specifiers.

  2. Which pointer control is used to read multiple records sequentially?

    1. @n

    2. +n

    3. /

    4. all of the above

  3. Which pointer control can be used to read records non-sequentially?

    1. @n

    2. #n

    3. +n

    4. /

  4. Which SAS statement correctly reads the values for Fname, Lname, Address, City, State and Zip in order?

    Chapter Quiz
    1. input Fname $ Lname $ /
            Address $20. /
            City $ State $ Zip $;
    2.     input Fname $ Lname $ /;
            Address $20. /;
            City $ State $ Zip $;
    3. input / Fname $ Lname $
            / Address $20.
            City $ State $ Zip $;
    4. input / Fname $ Lname $;
            / Address $20.;
            City $ State $ Zip $;
  5. Which INPUT statement correctly reads the values for ID in the fourth record, then returns to the first record to read the values for Fname and Lname?

    Chapter Quiz
    1. input #4 ID $5.
               #1 Fname $ Lname $;
    2. input #4 ID $ 1-5
               #1 Fname $ Lname $;
    3. input #4 ID $
               #1 Fname $ Lname $;
    4. all of the above

  6. How many records will be read for each execution of the DATA step?

    Chapter Quiz
    data spring.sportswr;
       infile newitems;
       input #1 Item $ Color $
             #3 @8 Price comma6.
             #2 Fabric $
             #3 SKU $ 1-6;
    run;
    1. one

    2. two

    3. three

    4. four

  7. Which INPUT statement correctly reads the values for City, State, and Zip?

    Chapter Quiz
    1. input #3 City $ State $ Zip $;

    2. input #3 City & $11. State $ Zip $;

    3. input #3 City $11. +2 State $2. + 2 Zip $5.;

    4. all of the above

  8. Which program does not read the values in the first record as a variable named Item and the values in the second record as two variables named Inventory and Type?

    Chapter Quiz
    1. data perm.supplies;
         infile instock pad;
         input Item & $16. /
               Inventory 2. Type $8.;
      run;
    2. data perm.supplies;
         infile instock pad;
         input Item & $16.
               / Inventory 2. Type $8.;
      run;
    3. data perm.supplies;
         infile instock pad;
         input #1 Item & $16.
               Inventory 2. Type $8.;
      run;
    4. data perm.supplies;
         infile instock pad;
         input Item & $16.
               #2 Inventory 2. Type $8.;
      run;
  9. Which INPUT statement reads the values for Lname, Fname, Department and Salary (in that order)?

    Chapter Quiz
    1. input #1 Lname $ Fname $ /
            Department $12. Salary comma10.;
    2. input #1 Lname $ Fname $ /
            Department : $12. Salary : comma.;
    3. input #1 Lname $ Fname $
            #2 Department : $12. Salary : comma.;
    4. both b and c

  10. Which raw data file poses potential problems when you are reading multiple records for each observation?

    1. Chapter Quiz
    2. Chapter Quiz
    3. Chapter Quiz
    4. Chapter Quiz
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset