Chapter 21. Creating Multiple Observations from a Single Record

Overview

Introduction

Sometimes raw data files contain data for several observations in one record. Data is stored in this manner to reduce the size of the entire data file.

Each record can contain

  • repeating blocks of data that represent separate observations

    Introduction
  • an ID field followed by an equal number of repeating fields that represent separate observations

    Introduction
  • an ID field followed by a varying number of repeating fields that represent separate observations.

    Introduction

This chapter shows you several ways of creating multiple observations from a single record.

Objectives

In this chapter, you learn to

  • create multiple observations from a single record that contains repeating blocks of data

  • create multiple observations from a single record that contains one ID field followed by the same number of repeating fields

  • create multiple observations from a single record that contains one ID field followed by a varying number of repeating fields.

Additionally, you learn to

  • hold the current record across iterations of the DATA step

  • hold the current record for the next INPUT statement

  • execute SAS statements based on a variable's value

  • explicitly write an observation to a data set

  • execute SAS statements while a condition is true.

Reading Repeating Blocks of Data

Each record in the file Tempdata contains three blocks of data. Each block contains a date followed by the day's high temperature in a small city located in the southern United States.

Reading Repeating Blocks of Data

You could write a DATA step that reads each record and creates three different Date and Temp variables.

Reading Repeating Blocks of Data

But if you create a separate observation for each block of data in a record, you can later use several statistical procedures to analyze the data for each day.

Reading Repeating Blocks of Data

Holding the Current Record with a Line-Hold Specifier

As you begin to write the INPUT statement, you need to hold the current record until each block of data has been read and written to the data set as an observation. This is easily accomplished by using a line-hold specifier in the INPUT statement.

SAS provides two line-hold specifiers.

  • The trailing at sign (@) holds the input record for the execution of the next INPUT statement.

  • The double trailing at sign (@@) holds the input record for the execution of the next INPUT statement, even across iterations of the DATA step.

The term trailing indicates that the @ or @@ must be the last item specified in the INPUT statement. For example,

input Name $20. @; or input Name $20. @@;

This chapter teaches you how the trailing @@ can be used to hold a record across multiple iterations of the DATA step.

Using the Double Trailing At Sign (@@) to Hold the Current Record

Normally, each time a DATA step executes, the INPUT statement reads a new record. But when you use the trailing @@, the INPUT statement holds the current record and reads the next value.

Using the Double Trailing At Sign (@@) to Hold the Current Record

The double trailing at sign (@@)

  • works like the trailing @ except it also holds the data line in the input buffer across multiple executions of the DATA step

  • typically is used to read multiple SAS observations from a single data line

  • should not be used with the @ pointer control, with column input, nor with the MISSOVER option.

A record that is being held by the double trailing at sign (@@) is not released until either of the following events occurs:

  • the input pointer moves past the end of the record. Then the input pointer moves down to the next record.

    Using the Double Trailing At Sign (@@) to Hold the Current Record
  • an INPUT statement that has no line-hold specifier executes.

    input ID $ @@;
    .
    .
    input Department 5.;

This example requires only one INPUT statement to read the values for Date and HighTemp, but the INPUT statement must execute three times for each record.

The INPUT statement reads a block of values for Date and HighTemp, and then holds the current record by using the trailing @@. The values in the program data vector are written to the data set as an observation, and control returns to the top of the DATA step.

Using the Double Trailing At Sign (@@) to Hold the Current Record

In the next iteration, the INPUT statement reads the next block of values for Date and HighTemp from the same record.

Using the Double Trailing At Sign (@@) to Hold the Current Record

Completing the DATA Step

You can add a FORMAT statement to the DATA step to display the date or time values with a specified format in the data set. In the FORMAT statement below, the DATEw. format is used to display the values for Date in the form ddmmmyyyy.

Completing the DATA Step

DATA Step Processing of Repeating Blocks of Data

Here is the complete DATA step.

data perm.april90;
      infile tempdata;
      input Date : date. HighTemp @@;
      format date date9.;
run;

Example

As the execution phase begins, the input pointer rests on column 1 of record 1.

Example

During the first iteration of the DATA step, the first block of values for Date and High Temp are read and stored in the program data vector.

Example

The first observation is written to the data set, control returns to the top of the DATA step, and the values are reset to missing.

Example

During the second iteration, the @@ prevents the input pointer from moving down to column 1 of the next record.

Example

The INPUT statement reads the second block of values for Date and HighTemp in the first record.

Example

The second observation is written to the data set, and control returns to the top of the DATA step.

Example

During the third iteration, the last block of values is read and written to the data set as the third observation.

Example

During the fourth iteration, the first block of values in the second record is read and written as the fourth observation.

Example

The execution phase continues until the last block of data is read.

Example

You can display the data set with the PRINT procedure.

Example

Reading the Same Number of Repeating Fields

So far, you have created multiple observations from a single record by executing the DATA step once for each block of data in a record.

Now, look at another file that is organized differently.

Each record in the file Data97 contains a sales representative's ID number, followed by four repeating fields that represent his or her quarterly sales totals for 1997.

You want to pair each employee ID number with one quarterly sales total to produce a single observation. That way, four observations can be derived from one record.

Reading the Same Number of Repeating Fields

To accomplish this, you must execute the DATA step once for each record, repetitively reading and writing values in one iteration.

This means that a DATA step must

  • read the value for ID and hold the current record

  • create a new variable named Quarter to identify the fiscal quarter for each sales figure

  • read a new value for Sales and write the values to the data set as an observation

  • continue reading a new value for Sales and writing values to the data set three more times.

Using the Single Trailing At Sign (@) to Hold the Current Record

First, you need to read the value for ID and hold the record so that subsequent values for Sales can be read.

Using the Single Trailing At Sign (@) to Hold the Current Record

You are already familiar with the trailing @@, which holds the current record across multiple iterations of the DATA step.

However, in this case, you want to hold the record with the trailing @ line hold specifier, so that a second INPUT statement can read the values for Sales within the same iteration of the DATA step. Like the trailing @@, the single trailing @

  • enables the next INPUT statement to read from the same record

  • releases the current record when a subsequent INPUT statement executes without a line-hold specifier.

It's easy to distinguish between the trailing @@ and the trailing @ by remembering that

  • the double trailing at sign (@@) holds a record across multiple iterations of the DATA step until the end of the record is reached.

  • the single trailing at sign (@) releases a record when control returns to the top of the DATA step.

In this example, the first INPUT statement reads the value for ID and uses the trailing @ to hold the current record for the next INPUT statement in the DATA step.

Using the Single Trailing At Sign (@) to Hold the Current Record

The second INPUT statement reads a value for Sales and holds the record. The COMMAw.d informat in the INPUT statement reads the numeric value for Sales and removes the embedded commas. An OUTPUT statement writes the observation to the SAS data set, and the DATA step continues processing.

Using the Single Trailing At Sign (@) to Hold the Current Record

When all of the repeating fields have been read and output, control returns to the top of the DATA step, and the record is released.

Using the Single Trailing At Sign (@) to Hold the Current Record

More Efficient Programming

Each record contains four different values for the variable Sales, so the INPUT statement must execute four times. Rather than writing four INPUT statements, you can execute one INPUT statement repeatedly in an iterative DO loop.

Each time the loop executes, you need to write the values for ID, Quarter, and Sales as an observation to the data set. This is easily accomplished by using the OUTPUT statement.

data perm.sales97;
      infile data97;
      input ID $ @;
      do Quarter=1 to 4;
      input Sales : comma. @;
      output;
      end;
run;

By default, every DATA step contains an implicit OUTPUT statement at the end of the step. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, and SAS adds an observation to a data set only when the explicit OUTPUT statement is executed.

Processing a DATA Step That Contains an Iterative DO Loop

Now that the program is complete, let's see how SAS processes a DATA step that contains an iterative DO loop.

data perm.sales97;
   infile data97;
   input ID $ @;
   do Quarter=1 to 4;
      input Sales : comma. @;
      output;
   end;
run;

During the first iteration, the value for ID is read and Quarter is initial to 1, so the loop begins to execute.

Processing a DATA Step That Contains an Iterative DO Loop

The INPUT statement reads the first repeating field and assigns the value to Sales in the program data vector. The @ holds the current record.

Processing a DATA Step That Contains an Iterative DO Loop

The OUTPUT statement writes the values in the program data vector to the data set as the first observation.

Processing a DATA Step That Contains an Iterative DO Loop

The END statement indicates the bottom of the loop, but control returns to the DO statement, not to the top of the DATA step. Now the value of Quarter is incremented to 2.

Processing a DATA Step That Contains an Iterative DO Loop

The INPUT statement executes again, reading the second repeating field and storing the value for Sales in the program data vector.

Processing a DATA Step That Contains an Iterative DO Loop

The OUTPUT statement writes the values in the program data vector as the second observation.

Processing a DATA Step That Contains an Iterative DO Loop

The loop continues executing while the value for Quarter is 3, then 4. In the process, the third and fourth observations are created.

Processing a DATA Step That Contains an Iterative DO Loop

After the fourth observation is created, Quarter is incremented to 5 at the bottom of the DO loop and control returns to the top of the loop. The loop does not execute again because the value of Quarter is now greater than 4.

Processing a DATA Step That Contains an Iterative DO Loop

Now the RUN statement executes. Control returns to the top of the DATA step, and the input pointer moves to column 1 of the next record. The variable values in the program data vector are reset to missing.

Processing a DATA Step That Contains an Iterative DO Loop

When the execution phase is complete, you can display the data set with the PRINT procedure.

Processing a DATA Step That Contains an Iterative DO Loop

Reading a Varying Number of Repeating Fields

So far, each record in the file Data97 has contained the same number of repeating fields.

Reading a Varying Number of Repeating Fields

But suppose that some of the employees quit after the first quarter. Records that contain information for those employees might not contain sales totals for the second, third, or fourth quarter. These records contain a variable number of repeating fields.

Reading a Varying Number of Repeating Fields

The DATA step that you just wrote won't work with a variable number of repeating fields because now the value of Quarter is not constant for every record.

data perm.sales97;
   infile data97;
   input ID $ @;
   do Quarter=1 to 4;
      input Sales : comma. @;
      output;
   end;
run;

Using the MISSOVER Option

You can adapt the DATA step to accommodate a varying number of values for Sales.

Like the previous example with the same number of repeating fields, your DATA step must read the same record more than once. However, you need to prevent the input pointer from moving to the next record when there are missing values for Sales.

You can use the MISSOVER option in an INFILE statement to prevent SAS from reading the next record when missing values are encountered at the end of a record. Essentially, records that have a varying number of repeating fields are records that contain missing values, so you need to specify the MISSOVER option here as well.

Because there is at least one value for the repeating field, Sales, in each record, the first INPUT statement reads both the value for ID and the first value for Sales in the first record. The trailing @ holds the record so that any subsequent repeating fields can be read.

Using the MISSOVER Option
Using the MISSOVER Option

Other related options include FLOWOVER (the default), STOPOVER, and SCANOVER. For more information about TRUNCOVER and related options, see the SAS documentation.

Executing SAS Statements While a Condition Is True

Now consider how many times to read each record. Earlier, you created an index variable named Quarter whose value ranged from 1 to 4 because there were four repeating fields.

Now you want to read the record only while a value for Sales exists. Use a DO WHILE statement instead of the iterative DO statement, enclosing the expression in parentheses. In the example below, the DO WHILE statement executes while the value of Sales is not equal to a missing value (which is represented by a period).

data perm.sales97;
   infile data97 missover;
   input ID $ Sales : comma. @;
   do while (sales ne .);

Creating a Counter Variable

Because the DO WHILE statement does not create an index variable, you can create your own "counter" variable. You can then use a sum statement to increment the value of the counter variable each time the DO WHILE loop executes.

In the example below, the assignment statement that precedes the loop creates the counter variable Quarter and assigns it an initial value of zero. Each time the DO WHILE loop executes, the sum statement increments the value of Quarter by one.

data perm.sales97;
   infile data97 missover;
   input ID $ Sales : comma. @;
   Quarter=0;
   do while (sales ne .);
   quarter+1;

Completing the DO WHILE Loop

Now look at the other statements that should be executed in the DO WHILE loop. First, you need an OUTPUT statement to write the current observation to the data set. Then, another INPUT statement reads the next value for Sales and holds the record. You complete the DO WHILE loop with an END statement.

data perm.sales97;
   infile data97 missover;
   input ID $ Sales : comma. @;
   Quarter=0;
   do while (sales ne .);
      quarter+1;
      output; 
      input sales : comma. @; 
      end;
run;

Processing a DATA Step That Has a Varying Number of Repeating Fields

Here is the new version of the DATA step.

data perm.sales97;
   infile data97 missover;
   input ID $ Sales : comma. @;
   Quarter=0;
   do while (sales ne .);
      quarter+1;
      output;
      input sales : comma. @;
   end;
run;

During the first iteration of the DATA step, values for ID and Sales are read. Quarter is initialized to 0.

Processing a DATA Step That Has a Varying Number of Repeating Fields

The DO WHILE statement checks to see if Sales has a value, which it does, so the other statements in the DO loop execute. The Value of Quarter is incremented by 1 and the current observation is written to the data set.

Processing a DATA Step That Has a Varying Number of Repeating Fields

The INPUT statement reads the next value for Sales, the end of the loop is reached, and control returns to the DO WHILE statement.

Processing a DATA Step That Has a Varying Number of Repeating Fields

The condition is checked and Sales still has a value, so the loop executes again.

Processing a DATA Step That Has a Varying Number of Repeating Fields

Quarter is incremented to 2, and the values in the program data vector are written as the second observation.

Processing a DATA Step That Has a Varying Number of Repeating Fields

The MISSOVER option prevents the input pointer from moving to the next record in search of another value for Sales. At this point, Sales has no value.

Processing a DATA Step That Has a Varying Number of Repeating Fields

Because the condition is now false, the statements in the loop are not executed.

Processing a DATA Step That Has a Varying Number of Repeating Fields

Instead, control returns to the top of the DATA step, the values in the program data vector are reset to missing, and the input pointer moves to column 1 of the next record. The DATA step continues executing until all of the values for Sales are read.

Processing a DATA Step That Has a Varying Number of Repeating Fields

PROC PRINT output for the data set shows a varying number of observations for each employee.

Processing a DATA Step That Has a Varying Number of Repeating Fields

Chapter Summary

Text Summary

File Formats

One raw data record can contain enough information to produce several observations. Data is stored in this manner in order to reduce the size of the entire file. The data can be organized into

  • repeating blocks of data

  • an ID field followed by the same number of repeating fields

  • an ID field followed by a varying number of repeating fields.

Reading Repeating Blocks of Data

To create multiple observations from a record that contains repeating blocks of data, the DATA step needs to hold the current record until each block of data has been read and written to the data set as an observation. The DATA step should include statements that

  1. read the first block of values and hold the current record with the double trailing at sign (@@) line-hold specifier

  2. optionally add a FORMAT statement to display date or time values with a specified format

  3. write the first block of values as an observation

  4. execute the DATA step until all repeating blocks have been read.

Reading the Same Number of Repeating Fields

To create multiple observations from a record that contains an ID field and the same number of repeating fields, you must execute the DATA step once for each record, repetitively reading and writing values in one iteration. The DATA step should include statements that

  1. read the ID field and hold the current record with the single trailing at sign (@) line-hold specifier

  2. execute SAS statements using an iterative DO loop. The iterative DO loop repetitively processes statements that

    • read the next value of the repeating field and hold the record with the @ line-hold specifier

    • explicitly write an observation to the data set by using an OUTPUT statement.

  3. complete the iterative DO loop with an END statement.

Reading a Varying Number of Repeating Fields

To create multiple observations from a record that contains an ID field and a varying number of repeating fields, you must execute the DATA step once for each record, repetitively reading and writing values in one iteration while the value of the repeating field exists. The DATA step should include statements that

  1. prevent SAS from reading the next record if missing values were encountered in the current record by using the MISSOVER option

  2. read the ID field and the first repeating field, and then hold the record with the single trailing at sign (@) line-hold specifier

  3. optionally create a counter variable

  4. execute SAS statements while a condition is true, using a DO WHILE loop. A DO WHILE loop repetitively processes statements that

    • optionally increment the value of the counter variable by using a sum statement

    • explicitly add an observation to the data set by using an OUTPUT statement

    • read the next value of the repeating field and hold the record with the single trailing at sign (@) line-hold specifier.

  5. complete the DO WHILE loop with an END statement.

Syntax

Repeating Blocks of Data

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set;

    INFILE file-specification;

    INPUT variables @@;

    FORMAT date/time-variable format;

RUN;

An ID Field Followed by the Same Number of Repeating Fields

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set;

    INFILE file-specification;

    INPUT id-variable @;

    DO index-variable specification;

        INPUT repeating-variable @;

        OUTPUT;

    END;

RUN;

An ID Field Followed by a Varying Number of Repeating Fields

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set;

    INFILE file-specification MISSOVER;

    INPUT id-variable repeating-variable @;

    counter-variable=0;

    DO WHILE (expression);

        counter-variable+1;

        OUTPUT;

        INPUT repeating-variable @;

    END;

RUN;

Sample Programs

Repeating Blocks of Data

libname perm 'c:
ecordsweather';
filename tempdata 'c:
ecordsweather	empdata';
data perm.april90;
   infile tempdata;
   input Date : date. HighTemp @@;
   format date date9.;
run;

An ID Field Followed by the Same Number of Repeating Fields

libname perm 'c:
ecordssales';
filename data97 'c:
ecordssales1997.dat';
data perm.sales97;
   infile data97;
   input ID $ @;
   do Quarter=1 to 4;
      input Sales : comma. @;
      output;
   end;
run;

An ID Field Followed by a Varying Number of Repeating Fields

libname perm 'c:
ecordssales';
filename data97 'c:
ecordssales1997.dat';
data perm.sales97;
   infile data97 missover;
   input ID $ Sales : comma. @;
   Quarter=0;
   do while (sales ne .);
      quarter+1;
      output;
      input sales : comma. @;
   end;
run;

Points to Remember

  • The double trailing at sign (@@) holds a record across multiple iterations of the DATA step until the end of the record is reached.

  • The single trailing at sign (@) releases a record when control returns to the top of the DATA step.

  • Use an END statement to complete DO loops and DO WHILE loops.

Chapter Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. Which is true for the double trailing at sign (@@)?

    1. It enables the next INPUT statement to read from the current record across multiple iterations of the DATA step.

    2. It must be the last item specified in the INPUT statement.

    3. It is released when the input pointer moves past the end of the record.

    4. all of the above

  2. A record that is being held by a single trailing at sign (@) is automatically released when

    1. the input pointer moves past the end of the record.

    2. the next iteration of the DATA step begins.

    3. another INPUT statement that has an @ executes.

    4. another value is read from the observation.

  3. Which SAS program correctly creates a separate observation for each block of data?

    Chapter Quiz
    1. data perm.produce;
         infile fruit;
         input Item $ Variety : $10.;
      run;
    2. data perm.produce;
         infile fruit;
         input Item $ Variety : $10. @;
      run;
    3. data perm.produce;
         infile fruit;
         input Item $ Variety : $10. @@;
      run;
    4. data perm.produce;
         infile fruit @@;
         input Item $ Variety : $10.;
      run;
  4. Which SAS program reads the values for ID and holds the record for each value of Quantity, so that three observations are created for each record?

    Chapter Quiz
    1. data work.sales;
         infile unitsold;
         input ID $;
         do week=1 to 3;
            input Quantity : comma.;
            output;
         end;
      run;
    2. data work.sales;
         infile unitsold;
         input ID $ @@;
         do week=1 to 3;
            input Quantity : comma.;
            output;
         end;
      run;
    3. data work.sales;
         infile unitsold;
         input ID $ @;
         do week=1 to 3;
            input Quantity : comma.;
            output;
         end;
      run;
    4. data work.sales;
         infile unitsold;
         input ID $ @;
         do week=1 to 3;
            input Quantity : comma. @;
            output;
         end;
      run;
  5. Which SAS statement repetitively executes several statements when the value of an index variable named count ranges from 1 to 50, incremented by 5?

    1. do count=1 to 50 by 5;
    2. do while count=1 to 50 by 5;
    3. do count=1 to 50 + 5;
    4. do while (count=1 to 50 + 5);
  6. Which option below, when used in a DATA step, writes an observation to the data set after each value for Activity has been read?

    1. do choice=1 to 3;
         input Activity : $10. @;
         output;
      end;
      run;
    2. do choice=1 to 3;
         input Activity : $10. @;
      end;
      output;
      run;
    3. do choice=1 to 3;
         input Activity : $10. @;
      end;
      run;
    4. both a and b

  7. Which SAS statement repetitively executes several statements while the value of Cholesterol is greater than 200?

    1. do cholesterol > 200;
    2. do cholesterol gt 200;
    3. do while (cholesterol > 200);
    4. do while cholesterol > 200;
  8. Which choice below is an example of a sum statement?

    1. totalpay=1;
    2. totalpay+1;
    3. totalpay*1;
    4. totalpay by 1;
  9. Which program creates the SAS data set Perm.Topstore from the raw data file shown below?

    Chapter Quiz
    Chapter Quiz
    1. data perm.topstores;
         infile sales98 missover;
         input Store Sales : comma. @;
         do while (sales ne .);
            month + 1;
            output;
            input sales : comma. @;
         end;
      run;
    2. data perm.topstores;
         infile sales98 missover;
         input Store Sales : comma. @;
         do while (sales ne .);
            Month=0;
            month + 1;
            output;
            input sales : comma. @;
         end;
      run;
    3. data perm.topstores;
         infile sales98 missover;
         input Store Sales : comma.
         Month @;
         do while (sales ne .);
            month + 1;
            input sales : comma. @;
         end;
         output;
      run;
    4. data perm.topstores;
         infile sales98 missover;
         input Store Sales : comma. @;
         Month=0;
         do while (sales ne .);
            month + 1;
            output;
           input sales : comma. @;
            end;
      run;
  10. How many observations are produced by the DATA step that reads this external file?

    Chapter Quiz
    data perm.choices;
       infile icecream missover;
       input Day $ Flavor : $10. @;
       do while (flavor ne ’ ’);
          output; input flavor : $10. @;
       end;
    run;
    1. 3

    2. 5

    3. 12

    4. 15

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset