Information for one observation can be spread out over several records. You can write multiple INPUT statements to read each record that comprises a single observation …
…or,youcan write one INPUT statement that contains a line pointer control to specify the record(s) from which values are to be read.
You know that as SAS reads raw data values, it keeps track of its position with an input pointer. You have used column pointer controls and column specifications to determine the column placement of the input pointer.
Column Specifications |
|
Column Pointer Controls |
|
But you can also position the input pointer on a specific record by using a line pointer control in the INPUT statement.
There are two types of line pointer controls.
The forward slash (/) specifies a line location that is relative to the current one.
The #n specifies the absolute number of the line to which you want to move the pointer.
First we'll look at the forward slash (/). Later in the chapter, you'll learn how to use the #n, and you will see how these two controls can be combined.
You use the forward slash (/) line pointer control to read multiple records sequentially. The / advances the input pointer to the next record. The / line pointer control only moves the input pointer forward and must be specified after the instructions for reading the values in the current record.
The single INPUT statement below reads the values for Lname and Fname in the first record, followed by the values for Department and JobCode in the second record. Then the value for Salary is read in the third record.
Take a closer look at using the forward slash (/) line pointer control in the following example.
The raw data file Memdata contains the mailing list of a professional organization. Your task is to combine the information for each member into one data set observation. We'll begin by reading each member's name, followed by the street address, and finally the city, state, and zip code.
As you write the instructions to read the values for Fname and Lname, notice that not all of the values for Lname begin in the same column. So, you should use standard list input to read these values.
Now you want to read the values for Address from the second record. The / line pointer control advances the input pointer to the next record. At this point the INPUT statement is incomplete, so you should not place a semicolon after the line pointer control.
You can use column input to read the values in the next record as one variable named Address. Then add a line pointer control to move the input pointer to the next record.
As you write the statements to read the values for City, notice that some of the values are longer than eight characters and contain embedded blanks. Also note that each value is followed by two consecutive blanks. To read these values, you should use modified list input with the ampersand (&) modifier.
The values for State and the values for Zip do not begin in the same column. Therefore, you should use list input to read these values.
Now that you've learned the basics of using the / line pointer control, let's take a closer look at the sequential processing of multiple records in the DATA step.
The values in the first record are read, and the / line pointer control moves the input pointer to the second record.
The values for Address are read, and the second / line pointer control advances the input pointer to the third record.
The values for City, State, and Zip are read, and the INPUT statement is completely executed.
The values in the program data vector are written to the data set as the first observation.
Control returns to the top of the DATA step, and the variable values are reinitialized to missing.
During the second iteration, values for Fname and Lname are read beginning in column 1 of the fourth record.
The values for Address are read and the / line pointer control advances the input pointer to the fifth record.
The values for City, State, and Zip are read, and the INPUT statement is completely executed.
The values in the program data vector are written to the data set as the second observation.
After the data set is complete, PROC PRINT output for Perm.Members shows that a single observation contains the complete information for each member.
proc print data=perm.members; run;
Note that the raw data file must contain the same number of records for each observation that is being created.
For example, suppose that there are only two records for the second member. However, the INPUT statement is set up to read three records.
The second member's name and address are read and assigned to corresponding variables. Then the input pointer advances to the next record, as directed by the INPUT statement, and the third member's name is read as a value for City.
The DATA step is still looking for a value for State and Zip, so the input pointer advances to the next record and reads the values for the member's address.
The PROC PRINT output for this data set illustrates the problem.
So, before you write the INPUT statement, make sure the raw data file contains the same number of records for each observation. In this raw data file there are now three records for each observation.
You already know how to read multiple records sequentially by using the / line pointer control. Now let's look at reading multiple records non-sequentially by using the #n line pointer control.
The #n specifies the absolute number of the line to which you want to move the input pointer. The #n pointer control can read records in any order; therefore, it must be specified before the instructions for reading values in a specific record.
The INPUT statement below first reads the values for Department and JobCode in the second record, then the values for Lname and Fname in the first record. Finally, it reads the value for Salary in the third record.
Take a closer look at using the #n line pointer control in the following example.
The raw data file Patdata contains information about the patients of a small group of general surgeons.
The first three records contain a patient's name, address, city, state, and zip code. The fourth record contains the patient's ID number followed by the name of the primary physician.
Suppose you want to read each patient's information in the following order:
ID number (ID)
first name (Fname)
last name (Lname)
address (Address)
city (City)
state (State)
zip (Zip)
doctor (Doctor)
To read the values for ID in the 4th record, specify #4 before naming the variable and defining its attributes.
To read the values for Fname and Lname in the first record, specify #1 before naming the variables and defining their attributes.
Use the #n line pointer control to move the input pointer to the second record and read the value for Address.
Now move the input pointer to the third record and read the values for City, State, and Zip, in that order.
Now you need to move the input pointer down to the fourth record to read the values for Doctor, which begin in column 7. Don't forget to add a semicolon at the end of the INPUT statement. A RUN statement completes the program.
The #n pointer controls in the program below cause four records to be read for each execution of the DATA step.
data perm.patients; infile patdata; input #4 ID $5. #1 Fname $ Lname $ #2 Address $23. #3 City $ State $ Zip $ #4 @7 Doctor $6.;
run;
The first time the DATA step executes, the first four records are read, and an observation is written to the data set.
During the second iteration, the next four records are read, and the second observation is written to the data set, and so on.
The PROC PRINT output of the data set shows how information that was spread over several records has been condensed into one observation.
proc print data=perm.patients noobs; run;
The forward slash (/) line pointer control and the #n line pointer control can be used together in a SAS program to read multiple records both sequentially and non-sequentially.
For example, you could use both the / line pointer control and the #n line pointer control to read the variables in the raw data file Patdata in the following order:
1. ID 5. City 2. Fname 6. State 3. Lname 7. Zip 4. Address 8. Doctor
To read the values for ID in the fourth record, specify #4 before naming the variable and defining its attributes.
Specify #1 to move the input pointer back to the first record, where the values for Fname and Lname are read.
Because the next record to be read is sequential, you can use the / line pointer control after the variable Lname to move the input pointer to the second record, where the value for Address is read.
The / line pointer control in the next line directs the input pointer to the third record, where the values for City, State and Zip are read.
The final / line pointer control moves the input pointer back to the fourth record, where the value for Doctor is read.
Information for one observation can be spread out over several records. You can write one INPUT statement that contains line pointer controls to specify the record(s) from which values are read.
The forward slash (/) line pointer control is used to read multiple records sequentially. Each time a / pointer is encountered, the input pointer advances to the next line.
The #n line pointer control is used to read multiple records non-sequentially. The #n specifies the absolute number of the line to which you want to move the pointer.
LIBNAME libref 'SAS-data-library';
FILENAME fileref 'filename';
DATA SAS-data-set;
INFILE file-specification;
INPUT #n variable …
#n @n variable … /
variable … /
variable … ;
RUN;
PROC PRINT DATA=SAS-data-set;
RUN;
libname perm 'c: ecordsempdata'; filename personel 'c: ecordsempdata ew.dat'; data perm.emplist3; infile personel; input #2 Department $ 5-16 #1 @16 ID $4. @1 Name $14. / JobCode 3. /
Salary comma9.; run; proc print data=perm.emplist3; run;
When a file contains multiple records per observation, the file must contain the same number of records for each observation that is being created.
Because the / pointer control can only move forward, the pointer control is specified after the values in the current record are read.
The #n pointer control can read records in any order and must be specified before the variable names are defined.
A semicolon should be placed at the end of the complete INPUT statement.
Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.
You can position the input pointer on a specific record by using
column pointer controls.
column specifications.
line pointer controls.
line hold specifiers.
Which pointer control is used to read multiple records sequentially?
@n
+n
/
all of the above
Which pointer control can be used to read records non-sequentially?
@n
#n
+n
/
Which SAS statement correctly reads the values for Fname, Lname, Address, City, State and Zip in order?
input Fname $ Lname $ / Address $20. / City $ State $ Zip $;
input Fname $ Lname $ /; Address $20. /; City $ State $ Zip $;
input / Fname $ Lname $ / Address $20. City $ State $ Zip $;
input / Fname $ Lname $; / Address $20.; City $ State $ Zip $;
Which INPUT statement correctly reads the values for ID in the fourth record, then returns to the first record to read the values for Fname and Lname?
input #4 ID $5. #1 Fname $ Lname $;
input #4 ID $ 1-5 #1 Fname $ Lname $;
input #4 ID $ #1 Fname $ Lname $;
all of the above
How many records will be read for each execution of the DATA step?
data spring.sportswr; infile newitems; input #1 Item $ Color $ #3 @8 Price comma6. #2 Fabric $ #3 SKU $ 1-6;
run;
one
two
three
four
Which INPUT statement correctly reads the values for City, State, and Zip?
input #3 City $ State $ Zip $;
input #3 City & $11. State $ Zip $;
input #3 City $11. +2 State $2. + 2 Zip $5.;
all of the above
Which program does not read the values in the first record as a variable named Item and the values in the second record as two variables named Inventory and Type?
data perm.supplies; infile instock pad; input Item & $16. / Inventory 2. Type $8.; run;
data perm.supplies; infile instock pad; input Item & $16. / Inventory 2. Type $8.; run;
data perm.supplies; infile instock pad; input #1 Item & $16. Inventory 2. Type $8.; run;
data perm.supplies; infile instock pad; input Item & $16. #2 Inventory 2. Type $8.; run;
Which INPUT statement reads the values for Lname, Fname, Department and Salary (in that order)?
input #1 Lname $ Fname $ / Department $12. Salary comma10.;
input #1 Lname $ Fname $ / Department : $12. Salary : comma.;
input #1 Lname $ Fname $ #2 Department : $12. Salary : comma.;
both b and c
Which raw data file poses potential problems when you are reading multiple records for each observation?