Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 20. Creating a Single Observation from Multiple Records

Overview 599
- Introduction 599
- Objectives 600
Using Line Pointer Controls 600
Reading Multiple Records Sequentially 601
- The Forward Slash (/) Line Pointer Control 601
- Using the / Line Pointer Control 601
- Sequential Processing of Multiple Records in the DATA Step 603
- Number of Records Per Observation 607
Reading Multiple Records Non-Sequentially 608
- The #n Line Pointer Control 608
- Using the #n Line Pointer Control 609
- Execution of the DATA Step 610
Combining Line Pointer Controls 611
Chapter Summary 613
- Text Summary 613
  - Multiple Records per Observation 613
  - Reading Multiple Records Sequentially 613
  - Reading Multiple Records Non-Sequentially 613
  - Combining Line Pointer Controls 613
- Syntax 613
- Sample Program 613
- Points to Remember 614
Chapter Quiz 614

Overview

Introduction

Information for one observation can be spread out over several records. You can write multiple INPUT statements to read each record that comprises a single observation …

…or,youcan write one INPUT statement that contains a line pointer control to specify the record(s) from which values are to be read.

Objectives

In this chapter, you learn to

read multiple records sequentially and create a single observation
read multiple records non-sequentially and create a single observation.

Using Line Pointer Controls

You know that as SAS reads raw data values, it keeps track of its position with an input pointer. You have used column pointer controls and column specifications to determine the column placement of the input pointer.

Column Specifications	`input Name $ 1-12 Age 15-16 Gender $ 18;`
Column Pointer Controls	`input Name $12. @15 Age 2. @18 Gender $1.;`

But you can also position the input pointer on a specific record by using a line pointer control in the INPUT statement.

There are two types of line pointer controls.

The forward slash (/) specifies a line location that is relative to the current one.
The #n specifies the absolute number of the line to which you want to move the pointer.

First we'll look at the forward slash (/). Later in the chapter, you'll learn how to use the #n, and you will see how these two controls can be combined.

Reading Multiple Records Sequentially

The Forward Slash (/) Line Pointer Control

You use the forward slash (/) line pointer control to read multiple records sequentially. The / advances the input pointer to the next record. The / line pointer control only moves the input pointer forward and must be specified after the instructions for reading the values in the current record.

The single INPUT statement below reads the values for Lname and Fname in the first record, followed by the values for Department and JobCode in the second record. Then the value for Salary is read in the third record.

Using the / Line Pointer Control

Take a closer look at using the forward slash (/) line pointer control in the following example.

The raw data file Memdata contains the mailing list of a professional organization. Your task is to combine the information for each member into one data set observation. We'll begin by reading each member's name, followed by the street address, and finally the city, state, and zip code.

As you write the instructions to read the values for Fname and Lname, notice that not all of the values for Lname begin in the same column. So, you should use standard list input to read these values.
Now you want to read the values for Address from the second record. The / line pointer control advances the input pointer to the next record. At this point the INPUT statement is incomplete, so you should not place a semicolon after the line pointer control.
You can use column input to read the values in the next record as one variable named Address. Then add a line pointer control to move the input pointer to the next record.
As you write the statements to read the values for City, notice that some of the values are longer than eight characters and contain embedded blanks. Also note that each value is followed by two consecutive blanks. To read these values, you should use modified list input with the ampersand (&) modifier.
The values for State and the values for Zip do not begin in the same column. Therefore, you should use list input to read these values.

Sequential Processing of Multiple Records in the DATA Step

Now that you've learned the basics of using the / line pointer control, let's take a closer look at the sequential processing of multiple records in the DATA step.

The values in the first record are read, and the / line pointer control moves the input pointer to the second record.

Sequential Processing of Multiple Records in the DATA Step

The values for Address are read, and the second / line pointer control advances the input pointer to the third record.

The values for City, State, and Zip are read, and the INPUT statement is completely executed.

The values in the program data vector are written to the data set as the first observation.

Control returns to the top of the DATA step, and the variable values are reinitialized to missing.

During the second iteration, values for Fname and Lname are read beginning in column 1 of the fourth record.

The values for Address are read and the / line pointer control advances the input pointer to the fifth record.

The values for City, State, and Zip are read, and the INPUT statement is completely executed.

The values in the program data vector are written to the data set as the second observation.

After the data set is complete, PROC PRINT output for Perm.Members shows that a single observation contains the complete information for each member.

proc print data=perm.members;
run;

Number of Records Per Observation

Note that the raw data file must contain the same number of records for each observation that is being created.

For example, suppose that there are only two records for the second member. However, the INPUT statement is set up to read three records.

The second member's name and address are read and assigned to corresponding variables. Then the input pointer advances to the next record, as directed by the INPUT statement, and the third member's name is read as a value for City.

The DATA step is still looking for a value for State and Zip, so the input pointer advances to the next record and reads the values for the member's address.

The PROC PRINT output for this data set illustrates the problem.

So, before you write the INPUT statement, make sure the raw data file contains the same number of records for each observation. In this raw data file there are now three records for each observation.

Reading Multiple Records Non-Sequentially

The #n Line Pointer Control

You already know how to read multiple records sequentially by using the / line pointer control. Now let's look at reading multiple records non-sequentially by using the #n line pointer control.

The #n specifies the absolute number of the line to which you want to move the input pointer. The #n pointer control can read records in any order; therefore, it must be specified before the instructions for reading values in a specific record.

The INPUT statement below first reads the values for Department and JobCode in the second record, then the values for Lname and Fname in the first record. Finally, it reads the value for Salary in the third record.

Using the #n Line Pointer Control

Take a closer look at using the #n line pointer control in the following example.

The raw data file Patdata contains information about the patients of a small group of general surgeons.

The first three records contain a patient's name, address, city, state, and zip code. The fourth record contains the patient's ID number followed by the name of the primary physician.

Suppose you want to read each patient's information in the following order:

ID number (ID)
first name (Fname)
last name (Lname)
address (Address)
city (City)
state (State)
zip (Zip)
doctor (Doctor)

To read the values for ID in the 4th record, specify #4 before naming the variable and defining its attributes.
To read the values for Fname and Lname in the first record, specify #1 before naming the variables and defining their attributes.
Use the #n line pointer control to move the input pointer to the second record and read the value for Address.
Now move the input pointer to the third record and read the values for City, State, and Zip, in that order.
Now you need to move the input pointer down to the fourth record to read the values for Doctor, which begin in column 7. Don't forget to add a semicolon at the end of the INPUT statement. A RUN statement completes the program.

Execution of the DATA Step

The #n pointer controls in the program below cause four records to be read for each execution of the DATA step.

data perm.patients;
   infile patdata;
   input #4 ID $5.
         #1 Fname $ Lname $
         #2 Address $23.
         #3 City $ State $ Zip $
         #4 @7 Doctor $6.;

run;

The first time the DATA step executes, the first four records are read, and an observation is written to the data set.

During the second iteration, the next four records are read, and the second observation is written to the data set, and so on.

The PROC PRINT output of the data set shows how information that was spread over several records has been condensed into one observation.

proc print data=perm.patients noobs;
run;

Combining Line Pointer Controls

The forward slash (/) line pointer control and the #n line pointer control can be used together in a SAS program to read multiple records both sequentially and non-sequentially.

For example, you could use both the / line pointer control and the #n line pointer control to read the variables in the raw data file Patdata in the following order:

1. ID            5. City 
2. Fname         6. State 
3. Lname         7. Zip 
4. Address       8. Doctor

To read the values for ID in the fourth record, specify #4 before naming the variable and defining its attributes.
Specify #1 to move the input pointer back to the first record, where the values for Fname and Lname are read.
Because the next record to be read is sequential, you can use the / line pointer control after the variable Lname to move the input pointer to the second record, where the value for Address is read.
The / line pointer control in the next line directs the input pointer to the third record, where the values for City, State and Zip are read.
The final / line pointer control moves the input pointer back to the fourth record, where the value for Doctor is read.

Chapter Summary

Text Summary

Multiple Records per Observation

Information for one observation can be spread out over several records. You can write one INPUT statement that contains line pointer controls to specify the record(s) from which values are read.

Reading Multiple Records Sequentially

The forward slash (/) line pointer control is used to read multiple records sequentially. Each time a / pointer is encountered, the input pointer advances to the next line.

Reading Multiple Records Non-Sequentially

The #n line pointer control is used to read multiple records non-sequentially. The #n specifies the absolute number of the line to which you want to move the pointer.

Combining Line Pointer Controls

The / line pointer control and the #n line pointer control can be combined within a SAS program to read multiple records both sequentially and non-sequentially.

Syntax

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set;

INFILE file-specification;

INPUT #n variable …

#n @n variable … /

variable … /

variable … ;

RUN;

PROC PRINT DATA=SAS-data-set;

RUN;

Sample Program

libname perm 'c:
ecordsempdata';
filename personel 'c:
ecordsempdata
ew.dat';
data perm.emplist3;
   infile personel;
   input #2 Department $ 5-16
         #1 @16 ID $4. @1 Name $14. /
            JobCode 3. /

            Salary comma9.;
run;
proc print data=perm.emplist3;
run;

Points to Remember

When a file contains multiple records per observation, the file must contain the same number of records for each observation that is being created.
Because the / pointer control can only move forward, the pointer control is specified after the values in the current record are read.
The #n pointer control can read records in any order and must be specified before the variable names are defined.
A semicolon should be placed at the end of the complete INPUT statement.

Chapter Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

You can position the input pointer on a specific record by using
1. column pointer controls.
2. column specifications.
3. line pointer controls.
4. line hold specifiers.
Which pointer control is used to read multiple records sequentially?
1. @n
2. +n
3. /
4. all of the above
Which pointer control can be used to read records non-sequentially?
1. @n
2. #n
3. +n
4. /

Which SAS statement correctly reads the values for Fname, Lname, Address, City, State and Zip in order?

input Fname $ Lname $ /
      Address $20. /
      City $ State $ Zip $;

    input Fname $ Lname $ /;
      Address $20. /;
      City $ State $ Zip $;

input / Fname $ Lname $
      / Address $20.
      City $ State $ Zip $;

input / Fname $ Lname $;
      / Address $20.;
      City $ State $ Zip $;

Which INPUT statement correctly reads the values for ID in the fourth record, then returns to the first record to read the values for Fname and Lname?
1. ```
input #4 ID $5.
         #1 Fname $ Lname $;
```
2. ```
input #4 ID $ 1-5
         #1 Fname $ Lname $;
```
3. ```
input #4 ID $
         #1 Fname $ Lname $;
```
4. all of the above

How many records will be read for each execution of the DATA step?

data spring.sportswr;
   infile newitems;
   input #1 Item $ Color $
         #3 @8 Price comma6.
         #2 Fabric $
         #3 SKU $ 1-6;

run;

one
two
three
four

Which INPUT statement correctly reads the values for City, State, and Zip?
1. input #3 City $ State $ Zip $;
2. input #3 City & $11. State $ Zip $;
3. input #3 City $11. +2 State $2. + 2 Zip $5.;
4. all of the above

Which program does not read the values in the first record as a variable named Item and the values in the second record as two variables named Inventory and Type?

data perm.supplies;
   infile instock pad;
   input Item & $16. /
         Inventory 2. Type $8.;
run;

data perm.supplies;
   infile instock pad;
   input Item & $16.
         / Inventory 2. Type $8.;
run;

data perm.supplies;
   infile instock pad;
   input #1 Item & $16.
         Inventory 2. Type $8.;
run;

data perm.supplies;
   infile instock pad;
   input Item & $16.
         #2 Inventory 2. Type $8.;
run;

Which INPUT statement reads the values for Lname, Fname, Department and Salary (in that order)?

input #1 Lname $ Fname $ /
      Department $12. Salary comma10.;

input #1 Lname $ Fname $ /
      Department : $12. Salary : comma.;

input #1 Lname $ Fname $
      #2 Department : $12. Salary : comma.;

both b and c

Which raw data file poses potential problems when you are reading multiple records for each observation?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 20. Creating a Single Observation from Multiple Records

Create new playlist

Sign In

Sign Up

Chapter 20. Creating a Single Observation from Multiple Records

Overview

Introduction

Objectives

Using Line Pointer Controls

Reading Multiple Records Sequentially

The Forward Slash (/) Line Pointer Control

Using the / Line Pointer Control

Sequential Processing of Multiple Records in the DATA Step

Number of Records Per Observation

Reading Multiple Records Non-Sequentially

The #n Line Pointer Control

Using the #n Line Pointer Control

Execution of the DATA Step

Combining Line Pointer Controls

Chapter Summary

Text Summary

Multiple Records per Observation

Reading Multiple Records Sequentially

Reading Multiple Records Non-Sequentially

Combining Line Pointer Controls

Syntax

Sample Program

Points to Remember

Chapter Quiz

Table of Contents for
20. Creating a Single Observation from Multiple Records