Chapter 22. Reading Hierarchical Files

Overview

Introduction

Raw data files can be hierarchical in structure, consisting of a header record and one or more detail records. Typically, each record contains a field that identifies the record type.

Here, the P indicates a header record that contains a patient's ID number. The C indicates a detail record that contains the date of the patient's appointment and the charges that the patient has incurred.

Introduction

You can build a SAS data set from a hierarchical file by creating one observation per detail record and storing each header record as part of the observation.

Introduction

You can also build a SAS data set from a hierarchical file by creating one observation per header record and combining the information from detail records into summary variables.

Introduction

In this chapter, you learn how to read from a hierarchical file and create a SAS data set that contains either one observation for each detail record or one observation for each header record.

Objectives

In this chapter, you learn to

  • retain the value of a variable

  • conditionally execute a SAS statement

  • determine when the last observation is being processed

  • conditionally execute multiple SAS statements.

You can also review how to

  • use a line-hold specifier to hold the current record

  • explicitly write an observation to a data set.

Creating One Observation per Detail Record

In order to create one observation per detail record, it is necessary to distinguish between header and detail records. Having a field that identifies the type of the record makes this task easier.

In the raw data file Census, shown below, H indicates a header record that contains a street address, and P indicates a detail record that contains information about a person who lives at that address.

Creating One Observation per Detail Record

Let's see how you can create a data set that contains one observation for each person who lives at a specific address.

Creating One Observation per Detail Record

Retaining the Values of Variables

As you write the DATA step to read this file, remember that you want to keep the header record as a part of each observation until the next header record is encountered. To do this, you need to use a RETAIN statement to retain the values for Address across iterations of the DATA step.

Retaining the Values of Variables
Retaining the Values of Variables

Next, you need to read the first field in each record, which identifies the record's type. You also need to use the @ line-hold specifier to hold the current record so that the other values in the record can be read.

Retaining the Values of Variables

Conditionally Executing SAS Statements

You can use the value of type to identify each record. If type is H, you need to execute an INPUT statement to read the values for Address. However, if type is P, then execute an INPUT statement to read the values for Name, Age, and Gender.

You can tell SAS to perform a given task based on a specific condition by using an IF-THEN statement.

Conditionally Executing SAS Statements

Expressions in conditional statements usually involve some kind of comparison. In the example shown above, type is being compared to a value that is found in the raw data. When the condition is met, the expression is evaluated as true, and the statement that follows the keyword THEN is executed.

The expression defines a condition so that when the value of type is H, the INPUT statement reads the values for Address. However, when the value of type is not H, the expression is evaluated as false, and the INPUT statement is not executed. Notice that the value is enclosed in quotation marks because it is a character value.

Conditionally Executing SAS Statements
if type='h' then ... ;

Reading a Detail Record

Now think about what needs to happen when a detail record is read. Remember, you want to write an observation to the data set only when the value of type is P.

Reading a Detail Record

You can use a subsetting IF statement to check for the condition that type is P. The remaining DATA step statements execute only when the condition is true. If type is not P, then the values for Name, Age, and Gender are not read, the values in the program data vector are not written to the data set as an observation, and control returns to the top of the DATA step.

The values for Address are retained only as a part of the observation. Remember, you do not want to create an observation for each header record.

data perm.people;
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then input @3 address $15.;
   if type='P'; 
   input @3 Name $10. @13 Age 3. @16 Gender $1.;
run;

Dropping Variables

Because type is useful only for identifying a record's type, drop the variable from the data set. The DROP= option in the DATA statement shown here prevents the values of type from being written to the data set.

data perm.people (drop=type);
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then input @3 address $15.;
   if type='P';
   input @3 Name $10. @13 Age 3. @16 Gender $1.;
run;

Processing a DATA Step That Creates One Observation per Detail Record

Let's take a look at how this DATA step is processed.

data perm.people (drop=type);
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then input @3 address $15.;
   if type='P';
   input @3 Name $10. @13 Age 3. @16 Gender $1.;
run;

At compile time, the variable type is flagged so that its values are not written to the data set. Address is flagged so that its value is retained across iterations of the DATA step.

Processing a DATA Step That Creates One Observation per Detail Record

As the DATA step begins to execute, the INPUT statement reads the value for type and holds the first record.

Processing a DATA Step That Creates One Observation per Detail Record

The condition type='H' is checked and found to be true, so the INPUT statement reads the value for Address in the first record.

Processing a DATA Step That Creates One Observation per Detail Record

Next, the subsetting IF statement checks for the condition type='P'. Because the condition is not true, the remaining statements are not executed and control returns to the top of the DATA step.

Processing a DATA Step That Creates One Observation per Detail Record

As the second iteration begins, the input pointer moves to the next record and a new value for type is read. The condition expressed in the IF-THEN statement is not true, so the statement following the THEN keyword is not executed.

Processing a DATA Step That Creates One Observation per Detail Record

Now the subsetting IF statement checks for the condition type='P'. In this iteration, the condition is true, so the remaining INPUT statement is executed.

Processing a DATA Step That Creates One Observation per Detail Record

The INPUT statement reads the values for Name, Age, and Gender.

Processing a DATA Step That Creates One Observation per Detail Record

Then the values in the program data vector are written as the first observation, and control returns to the top of the DATA step. Notice that the values for type are not included.

Processing a DATA Step That Creates One Observation per Detail Record

As the execution phase continues, observations are produced from the third and fourth records. However, notice that the fifth record is a header record. During the fifth iteration, the condition type='H' is true, so the new value for Address is read into the program data vector.

Processing a DATA Step That Creates One Observation per Detail Record

Displaying Your Results

When the execution phase is complete, you can display the data set by using the PRINT procedure.

Displaying Your Results

Creating One Observation per Header Record

In the previous example, you learned how to create one observation per detail record. But suppose you only want to know how many people reside at each address. You can create a data set that reads each detail record, counts the number of people, and stores this value in a summary variable.

In the example below, this summary variable, Total, is combined with the header record to form an observation in the data set. As you can see, creating one observation per header record condenses a large amount of information into a concise data set.

Creating One Observation per Header Record

As you write the DATA step to read this file, you need to think about performing several tasks. First, the value of Address must be retained as detail records are read and summarized.

Creating One Observation per Header Record

Next, the value of type must be read in order to determine whether the current record is a header record or a detail record. Add an @ to hold the record so that another INPUT statement can read the remaining values.

Creating One Observation per Header Record

When the value of type indicates a header record, several statements need to be executed. When the value of type indicates a detail record, you need to define an alternative set of actions. Let's look at executing different sets of statements for each value of type.

DO Group Actions for Header Records

To execute multiple SAS statements based on the value of a variable, you can use a simple DO group with an IF-THEN statement. When the condition type='H' is true, several statements need to be executed.

data perm.residnts;
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then do;
  • First, you need to determine whether this is the first header record in the external file. You do not want the first header record to be written as an observation until the related detail records have been read and summarized.

    _N_ is an automatic variable whose value is the number of times the DATA step has begun to execute. The expression _n_ > 1 defines a condition where the DATA step has executed more than once. Use this expression in conjunction with the previous IF-THEN statement to check for these two conditions:

    1. The current record is a header record.

    2. The DATA step has executed more than once.

      data perm.residnts;
         infile census;
         retain Address;
         input type $1. @;
         if type='H' then do;
            if _n_ > 1
  • When the conditions type='H' and _n_ > 1 are true, an OUTPUT statement is executed. Thus, each header record except for the first one causes an observation to be written to the data set.

    data perm.residnts;
       infile census;
       retain Address;
       input type $1. @;
       if type='H' then do;
          if _n_ > 1 then output;
  • An assignment statement creates the summary variable Total and sets its value to 0.

    data perm.residnts;
       infile census;
       retain Address;
       input type $1. @;
       if type='H' then do;
          if _n_ > 1 then output;
          Total=0;
  • An INPUT statement reads the values for Address.

    data perm.residnts;
       infile census;
       retain Address;
       input type $1. @;
       if type='H' then do;
          if _n_ > 1 then output;
          Total=0;
          input address $ 3-17;
  • An END statement closes the DO group.

    data perm.residnts;
       infile census;
       retain Address;
       input type $1. @;
       if type='H' then do;
          if _n_ > 1 then output;
          Total=0;
          input address $ 3-17;
    end;

Reading Detail Records

When the value of type is not H, you need to define an alternative action. You can do this by adding an ELSE statement to the IF-THEN statement.

Remember that the IF-THEN statement executes a SAS statement when the condition specified in the IF clause is true. By adding an ELSE statement after the IF-THEN statement, you define an alternative action to be performed when the THEN clause is not executed.

data perm.residnts;
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then do;
      if _n_ > 1 then output;
      Total=0;
      input address $ 3-17;
end;
else

The only other type of record is a detail record, represented by a P. You want to count each person who is represented by a detail record and store the accumulated value in the summary variable Total. You do not need to read the values for Name, Age, and Gender.

Reading Detail Records

You have already initialized the value of Total to 0 each time a header record is read. Now, as each detail record is read, you can increment the value of Total by using a sum statement. In this example you're counting the number of detail records for each header record, so you increment the value of Total by 1 when the value of type is P.

data perm.residnts;
   infile census;
   retain Address;
   input type $1. @;
   if type='H' then do;
      if _n_ > 1 then output;
      Total=0;
      input address $ 3-17;
end;
else if type='P' then total+1;
Reading Detail Records
else if type='B' then total+cost;
Reading Detail Records

Determining the End of the External File

Your program writes an observation to the data set only when another header record is read and the DATA step has executed more than once. But after the last detail record is read, there are no more header records to cause the last observation to be written to the data set.

Determining the End of the External File

You need to determine when the last record in the file is read so that you can then execute another explicit OUTPUT statement. You can determine when the current record is the last record in an external file by specifying the END= option in the INFILE statement. (You learned how to use the END= option with a SET statement if you completed Chapter 12, "Reading SAS Data Sets," on page 345.)

In the following example, the END= variable is defined in the INFILE statement as last. When last has a value other than 0, the OUTPUT statement writes the last observation to the data set.

data perm.residnts;
   infile census end=last;
   retain Address;
   input type $1. @;
   if type='H' then do;
    if _n_ > 1 then output;
       Total=0;
      input address $ 3-17;
   end;
   else if type='P' then total+1;
   if last then output;

A DROP= option in the DATA statement drops the variable type from the data set, and a RUN statement completes the DATA step.

data perm.residnts (drop=type);
   infile census end=last;
   retain Address;
   input type $1. @;
   if type='H' then do;
      if _n_ > 1 then output;
         Total=0;
      input address $ 3-17;
   end;
   else if type='P' then total+1;
   if last then output;
run;

Processing a DATA Step That Creates One Observation per Header Record

During the compile phase, the variable type is flagged so that later it can be dropped. The value for Address is retained.

Processing a DATA Step That Creates One Observation per Header Record

As the execution begins, _N_ is 1 and last is 0.

Processing a DATA Step That Creates One Observation per Header Record

Now the value for type is read, the condition type='H' is true, so the statements in the DO group execute.

Processing a DATA Step That Creates One Observation per Header Record

The condition N>1 is not true, so the OUTPUT statement is not executed. However, Total is assigned the value of 0 and the value for Address is read.

Processing a DATA Step That Creates One Observation per Header Record

The END statement closes the DO group. The alternataive condition expressed in the ELSE statement is not checked because the first condition, type='H', was true.

Processing a DATA Step That Creates One Observation per Header Record

The value of last is still 0, so the OUTPUT statement is not executed. The RUN statement passes control back to the top of the DATA step.

Processing a DATA Step That Creates One Observation per Header Record

During the second iteration, the value of type is 'P' and Total is incremented by 1. Again, the value of last is 0, so control returns to the top of the DATA step.

Processing a DATA Step That Creates One Observation per Header Record

During the fifth iteration, the value of type is 'H' and _N_ is greater than 1, so the values for Address and Total are written to the data set as the first observation.

Processing a DATA Step That Creates One Observation per Header Record

As the last record in the file is read, the variable last is set to 1. Now that the condition for last is true, the values in the program data vector are written to the data set as the final observation.

Processing a DATA Step That Creates One Observation per Header Record

Chapter Summary

Text Summary

Hierarchical Raw Data Files

Raw data files can be hierarchical in structure, consisting of a header record and one or more detail records. You can build a SAS data set from a hierarchical file by creating one observation

  • per detail record and storing each header record as part of the observation

  • per header record and combining the information from detail records into summary variables.

Creating One Observation per Detail Record

In order to create one observation per detail record, it is necessary to distinguish between header and detail records. Having a field that identifies the type of the record makes this task easier.

As you write the DATA step, use a RETAIN statement to keep the header record as a part of each observation until the next header record is encountered.

Next, you need to read the field in each record that identifies the record's type. Remember to use the @ line-hold specifier to hold the current value of each record type so that the other values in the record can be read.

Use an IF-THEN statement to check for the condition that the record is a header record. If the record is a header record, you need to execute an INPUT statement to read the variable values for that record.

You can use a subsetting IF statement to check for the condition that the record is a detail record. If the record is a detail record, use another INPUT statement to read the variable values in that record.

You can use the DROP= option to prevent the variable that identifies each record's type from being included in the data set.

Creating One Observation per Header Record

Creating one observation per header record condenses a large amount of information into a concise data set. As you write the DATA step, you need to think about performing several tasks.

As with creating one observation per detail record, use a RETAIN statement to keep the header record as a part of each observation until the next header record is encountered. Then read the field in each record that identifies the record's type. Remember to use the @ line-hold specifier to hold the current record so that the other values in the record can be read.

When the record is a header record, multiple statements need to be executed. You can do this by adding a simple DO group to an IF-THEN statement. Within the DO group, you need to

  1. determine whether this is the first header record in the external file by using the automatic variable _N_

  2. use an OUTPUT statement to write each header record except for the first one to the data set

  3. use an assignment statement to create a summary variable, and set its value to 0

  4. add an INPUT statement to read the variable values in the header record

  5. close the loop with an END statement.

When the record is a detail record, you need to define an alternative set of actions. You can do this by adding an ELSE statement to an IF-THEN statement. As each detail record is read, you can increment the value of the summary variable by using a sum statement.

After the last detail record is read, there are no more header records to cause the last observation to be written to the data set. You can determine when the current record is the last record in an external file by specifying the END= option in the INFILE statement. Again, you can use the DROP= option to prevent the variable that identifies each record's type from being included in the data set.

Syntax

Syntax to Create One Observation for Each Detail Record

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA=SAS-data-set(DROP=variable);

          INFILE file-specification;

          RETAIN variable;

          INPUT variable;

          IF variable='condition'THEN SAS statement;

          IF variable='condition';

              SAS-statement;

RUN;

Syntax to Create One Observation for Each Header Record

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA= SAS-data-set(DROP= variable);

          INFILE file-specification END=variable;

          RETAIN variable;

          INPUT variable;

          IF variable='condition'THEN DO;

              IF _N_ > 1THEN OUTPUT;

              summary-variable=0;

              INPUT variable;

    END;

    ELSE IF variable='condition' THEN

             summary-variable+expression;

    IF variableTHEN OUTPUT;

RUN;

Sample Programs

Program to Create One Observation for Each Detail Record

libname perm 'c:
ecordscensus2k';
filename census 'c:
ecordscensus2ksurvey.dat';
data perm.people(drop=type);
   infile census;
   retain Address; input type $1. @;
   if type='H' then input @3 address $15.;
   if type='P';
   input @3 Name $10. @13 Age 3. @16 Gender $1.;
run;

Program to Create One Observation for Each Header Record

libname perm 'c:
ecordscensus2k';
filename census 'c:
ecordscensus2ksurvey.dat';
data perm.residnts (drop=type);
   infile census end=last;
   retain Address;
   input type $1. @;
   if type='H' then do;
      if _n_ > 1 then output;
      Total=0;
      input address $ 3-17;
   end;
   else if type='P' then total+1;
   if last then output;
run;

Points to Remember

  • As with automatic variables, the END= variable is not written to the data set.

  • Values are automatically retained when using a sum statement. Therefore, it may be necessary to set the value of the counter variable back to 0 when a new header is encountered.

Chapter Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. When you write a DATA step to create one observation per detail record you need to

    1. distinguish between header and detail records.

    2. keep the header record as a part of each observation until the next header record is encountered.

    3. hold the current value of each record type so that the other values in the record can be read.

    4. all of the above

  2. Which SAS statement reads the value for code (in the first field), and then holds the value until an INPUT statement reads the remaining value in each observation in the same iteration of the DATA step?

    Chapter Quiz
    1. input code $2. @;
    2. input code $2. @@;
    3. retain code;
    4. none of the above

  3. Which SAS statement checks for the condition that Record equals C and executes a single statement to read the values for Amount?

    1. if record=c then input @3 Amount comma7.;
    2. if record='C' then input @3 Amount comma7.;
    3. if record='C' then do input @3 Amount comma7.;
    4. if record=C then do input @3 Amount comma7.;
  4. After the value for code is read in the sixth iteration, which illustration of the program data vector is correct?

    Chapter Quiz
    data perm.produce (drop=code);
       infile orders;
       retain Vegetable;
       input code $1. @;
       if code='H' then input @3 vegetable $6.;
       if code='P';
       input @3 Variety : $10. @15 Supplier : $15.;
    run;
    proc print data=perm.produce;
    run;
    1. Chapter Quiz
    2. Chapter Quiz
    3. Chapter Quiz
    4. Chapter Quiz
  5. What happens when the fourth iteration of the DATA step is complete?

    Chapter Quiz
    data perm.orders (drop=type);
       infile produce;
       retain Fruit;
       input type $1. @;
       if type='F' then input @3 fruit $7.;
       if type='V';
       input @3 Variety : $16. @20 Price comma5.;
    run;
    1. All of the values in the program data vector are written to the data set as the third observation.

    2. All of the values in the program data vector are written to the data set as the fourth observation.

    3. The values for Fruit, Variety, and Price are written to the data set as the third observation.

    4. The values for Fruit, Variety, and Price are written to the data set as the fourth observation.

  6. Which SAS statement indicates that several other statements should be executed when Record has a value of A?

    Chapter Quiz
    1. if record='A' then do;
    2. if record=A then do;
    3. if record='A' then;
    4. if record=A then;
  7. Which is true for the following statements (X indicates a header record)?

    if code='X' then do;
       if _n_ > 1 then output;
       Total=0;
       input Name $ 3-20;
    end;
    1. _N_ equals the number of times the DATA step has begun to execute.

    2. When code='X' and _n_ > 1 are true, an OUTPUT statement is executed.

    3. Each header record causes an observation to be written to the data set.

    4. a and b

  8. What happens when the condition type='P' is false?

    if type='P' then input @3 ID $5. @9 Address $20.;
    else if type='V' then input @3 Charge 6.;
    1. The values for ID and Address are read.

    2. The values for Charge are read.

    3. type is assigned the value of V.

    4. The ELSE statement is executed.

  9. What happens when last has a value other than zero?

    data perm.househld (drop=code);
       infile citydata end=last;
       retain Address;
       input type $1. @;
       if code='A' then do;
          if _n_ > 1 then output;
          Total=0;
          input address $ 3-17;
       end;
       else if code='N' then total+1;
       if last then output;
    run;
    1. last has a value of 1.

    2. The OUTPUT statement writes the last observation to the data set.

    3. The current value of last is written to the DATA set.

    4. a and b

  10. Based on the values in the program data vector, what happens next?

    Chapter Quiz
    data work.supplies (drop=type amount);
       infile orders end=last;
       retain Department Extension;
       input type $1. @;
       if type='D' then do;
          if _n_ > 1 then output;
          Total=0;
          input @3 department $10. @16 extension $5.;
       end;
       else if type='S' then do;
          input @16 Amount comma5.;
          total+amount;
          if last then output; end;
    run;
    1. All the values in the program data vector are written to the data set as the first observation.

    2. The values for Department, Total, and Extension are written to the data set as the first observation.

    3. The values for Department, Total, and Extension are written to the data set as the fourth observation.

    4. The value of last changes to 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset