Chapter 11. Creating and Managing Variables

Overview

Introduction

You've learned how to create a SAS data set from raw data that is stored in an external file. You've also learned how to subset observations and how to assign values to variables.

This chapter shows you additional techniques for creating and managing variables. In this chapter, you learn how to create sum variables, assign variable values conditionally, select variables, and assign permanent labels and formats to variables.

Introduction

Objectives

In this chapter, you learn how to

  • create variables that accumulate variable values

  • initialize values of sum variables

  • assign values to variables conditionally

  • specify an alternative action when a condition is false

  • specify lengths for variables

  • delete unwanted observations

  • select variables

  • assign permanent labels and formats.

Creating and Modifying Variables

Accumulating Totals

It is often useful to create a variable that accumulates the values of another variable. Suppose you want to create the data set Clinic.Stress and to add a new variable, SumSec, to accumulate the total number of elapsed seconds in treadmill stress tests.

Accumulating Totals

To add the result of an expression to an accumulator variable, you can use a sum statement in your DATA step.

The sum statement adds the result of the expression that is on the right side of the plus sign (+) to the numeric variable that is on the left side of the plus sign. At the top of the DATA step, the value of the numeric variable is not set to missing as it usually is when reading raw data. Instead, the variable retains the new value in the program data vector for use in processing the next observation.

Example

To find the total number of elapsed seconds in treadmill stress tests, you need a variable (in this example, SumSec) whose value begins at 0 and increases by the amount of the total seconds in each observation. To calculate the total number of elapsed seconds in treadmill stress tests, you use the sum statement shown below:

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
            RecHR 35-37 TimeMin 39-40 TimeSec 42-43
            Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   SumSec+totaltime;
run;

The value of the variable on the left side of the plus sign (here, SumSec) begins at 0 and increases by the value of TotalTime with each observation.

SumSec

=

TotalTime

+

Previous total

0

    

758

=

758

+

0

1363

=

605

+

758

2036

=

673

+

1363

2618

=

582

+

2036

3324

=

706

+

2618

Initializing Sum Variables

In a previous example, the sum variable SumSec was initialized to 0 by default before the first observation was read. But what if you want to initialize SumSec to a different number, such as the total seconds from previous treadmill stress tests?

You can use the RETAIN statement to assign an initial value other than the default value of 0 to a variable whose value is assigned by a sum statement.

The RETAIN statement

  • assigns an initial value to a retained variable

  • prevents variables from being initialized each time the DATA step executes.

Example

Suppose you want to add 5400 seconds (the accumulated total seconds from a previous treadmill stress test) to the variable SumSec in the Clinic.Stress data set when you create the data set. To initialize SumSec with the value 5400, you use the RETAIN statement shown below.

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
    TotalTime=(timemin*60)+timesec;
    retain SumSec 5400;
    sumsec+totaltime;
run;

Now the value of SumSec begins at 5400 and increases by the value of TotalTime with each observation.

SumSec

=

TotalTime

+

Previous total

5400

    

6158

=

758

+

5400

6763

=

605

+

6158

7436

=

673

+

6763

8018

=

582

+

7436

8724

=

706

+

8018

Assigning Values Conditionally

In the previous section, you created the variable SumSec by using a sum statement to add total seconds from treadmill stress. This time, let's create a variable that categorizes the length of time that a subject spends on the treadmill during a stress test. This new variable, TestLength, will be based on the value of the existing variable TotalTime. The value of TestLength will be assigned conditionally:

If TotalTime is . . .

then TestLength is . . .

greater than 800

Long

750 - 800

Normal

less than 750

Short

To perform an action conditionally, use an IF-THEN statement. The IF-THEN statement executes a SAS statement when the condition in the IF clause is true.

Example

To assign the value Long to the variable TestLength when the value of TotalTime is greater than 800, add the following IF-THEN statement to your DATA step:

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   if totaltime>800 then TestLength='Long';
run;

SAS executes the assignment statement only when the condition (TotalTime>800) is true. If the condition is false, the value of TestLength will be missing.

Comparison and Logical Operators

When writing IF-THEN statements, you can use any of the following comparison operators:

Operator

Comparison Operation

=or eq

equal to

^= or ne

not equal to

>or gt

greater than

< or lt

less than

>= or ge

greater than or equal to

<= or le

less than or equal to

in

equal to one of a list

Examples:

if test<85 and time<=20
   then Status='RETEST';
if region in ('NE','NW','SW')
   then Rate=fee-25;
if target gt 300 or sales ge 50000
   then Bonus=salary*.05;

You can also use these logical operators:

Operator

Logical Operation

&

and

|

or

^ or -

not

Use the AND operator to execute the THEN statement if both expressions that are linked by AND are true.

if status='OK' and type=3
   then Count+1;
if (age^=agecheck | time^=3)
   & error=1 then Test=1;

Use the OR operator to execute the THEN statement if either expression that is linked by OR is true.

if (age^=agecheck | time^=3)
    & error=1 then Test=1;
if status='S' or cond='E'
   then Control='Stop';

Use the NOT operator with other operators to reverse the logic of a comparison.

if not(loghours<7500)
    then Schedule='Quarterly';
if region not in ('NE','SE')
    then Bonus=200;

Character values must be specified in the same case in which they appear in the data set and must be enclosed in quotation marks.

if status= 'OK' and type=3
   then Count+1;
if status= 'S' or cond= 'E'
   then Control= 'Stop';
if not(loghours<7500)
   then Schedule= 'Quarterly';
if region not in ('NE','SE')
   then Bonus=200;

Logical comparisons that are enclosed in parentheses are evaluated as true or false before they are compared to other expressions. In the example below, the OR comparison in parentheses is evaluated before the first expression and the AND operator are evaluated.

Examples:
Examples:

0 = False

. = False

1 = True

As a result, you need to be careful when using the OR operator with a series of comparisons. Remember that only one comparison in a series of OR comparisons must be true to make a condition true, and any nonzero, nonmissing constant is always evaluated as true. Therefore, the following subsetting IF statement is always true:

if x=1 or 2;

SAS first evaluates x=1, and the result can be either true or false; however, since the 2 is evaluated as nonzero and nonmissing (true), the entire expression is true. In this statement, however, the condition is not necessarily true because either comparison can evaluate as true or false:

if x=1 or x=2;

Providing an Alternative Action

Now suppose you want to assign a value to TestLength based on the other possible values of TotalTime. One way to do this is to add IF-THEN statements for the other two conditions, as shown below.

if totaltime>800 then TestLength='Long';
if 750<=totaltime<=800 then TestLength='Normal';
if totaltime<750 then TestLength='Short';

However, when the DATA step executes, each IF statement is evaluated in order, even if the first condition is true. This wastes system resources and slows the processing of your program.

Instead of using a series of IF-THEN statements, you can use the ELSE statement to specify an alternative action to be performed when the condition in an IF-THEN statement is false. As shown below, you can write multiple ELSE statements to specify a series of mutually exclusive conditions.

if totaltime>800 then TestLength='Long';
else if 750<=totaltime<=800 then TestLength='Normal';
else if totaltime<750 then TestLength='Short';

The ELSE statement must immediately follow the IF-THEN statement in your program. An ELSE statement executes only if the previous IF-THEN/ELSE statement is false.

So, to assign a value to TestLength when the condition in your IF-THEN statement is false, you can add the ELSE statement to your DATA step, as shown below:

data clinic.stress;
     infile tests;
     input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
           RecHR 35-37 TimeMin 39-40 TimeSec 42-43
           Tolerance $ 45;
     TotalTime=(timemin*60)+timesec;
     retain SumSec 5400;
     sumsec+totaltime;
     if totaltime>800 then TestLeng h='Long';
     else if 750<=totaltime<=800 then TestLength='Normal'; 
     else if totaltime<750 then TestLength='Short';
run;
Providing an Alternative Action
  • Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN statements.

  • Using IF-THEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent IF-THEN statements are not evaluated.

For greater efficiency, construct your IF-THEN/ELSE statements with conditions of decreasing probability.

Providing an Alternative Action
data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   if totaltime>800 then TestLength='Long';
   else if 750<=totaltime<=800 then TestLength='Normal';
   else put 'NOTE: Check this Length: 'totaltime=;
run;

Specifying Lengths for Variables

Previously, you added IF-THEN and ELSE statements to a DATA step in order to create the variable TestLength. Values for TestLength were assigned conditionally, based on the value for TotalTime.

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   if totaltime>800 then TestLength='Long';
   else if 750<=totaltime<=800 then TestLength='Normal';
   else if totaltime<750 then TestLength='Short';
run;

But look what happens when you submit this program.

During compilation, when creating a new character variable in an assignment statement, SAS allocates as many bytes of storage space as there are characters in the first value that it encounters for that variable. In this case, the first value for TestLength occurs in the IF-THEN statement, which specifies a four-character value (Long). So TestLength is assigned a length of 4, and any longer values ( Normal and Short) are truncated.

Specifying Lengths for Variables
Specifying Lengths for Variables

You can use a LENGTH statement to specify a length (the number of bytes) for TestLength before the first value is referenced elsewhere in the DATA step.

Examples:

length Type $ 8;
length Address1 Address2 Address3    $200;
length  FirstName $12 LastName  $16;

Within your program, you include a LENGTH statement to assign a length to accommodate the longest value of the variable TestLength. The longest value is Normal, which has six characters. Because TestLength is a character variable, you must follow the variable name with a dollar sign ($).

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
run;
Examples:

Now that you have added the LENGTH statement to your program, the values of TestLength are no longer truncated.

Examples:

Subsetting Data

Deleting Unwanted Observations

So far in this chapter, you've learned to use IF-THEN statements to execute assignment statements conditionally. But you can specify any executable SAS statement in an IF-THEN statement. For example, you can use an IF-THEN statement with a DELETE statement to determine which observations to omit as you read data.

  • The IF-THEN statement executes a SAS statement when the condition in the IF clause is true.

  • The DELETE statement stops the processing of the current observation.

Example

The IF-THEN and DELETE statements below omit any observations whose values for RestHR are lower than 70.

data clinic.stress;
   infile tests;
    input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
          RecHR 35-37 TimeMin 39-40 TimeSec 42-43
          Tolerance $ 45;
    if resthr<70 then delete;
    TotalTime=(timemin*60)+timesec;
    retain SumSec 5400;
    sumsec+totaltime;
    length TestLength $ 6;
    if totaltime>800 then testlength='Long';
    else if 750<=totaltime<=800 then testlength='Normal';
    else if totaltime<750 then TestLength='Short';
run;

Selecting Variables

Sometimes you may need to read and process fields that you don't want to keep in your data set. In this case, you can use the DROP= and KEEP= data set options to specify the variables that you want to drop or keep.

Use the KEEP= option instead of the DROP= option if more variables are dropped than kept. You specify data set options in parentheses after a SAS data set name.

Example

Suppose you are interested in keeping only the new variable TotalTime and not the original variables TimeMin and TimeSec. You can drop TimeMin and TimeSec when you create the Stress data set.

data clinic.stress(drop=timemin timesec);
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   if tolerance='D';
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
run;
Example

Another way to exclude variables from your data set is to use the DROP statement or the KEEP statement. Like the DROP= and KEEP= data set options, these statements drop or keep variables. However, the DROP statement differs from the DROP= data set option in the following ways:

  • You cannot use the DROP statement in SAS procedure steps.

  • The DROP statement applies to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= data set option in the DATA statement.

The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies a list of variables to write to output data sets. Use the KEEP statement instead of the DROP statement if the number of variables to keep is significantly smaller than the number to drop.

Example

The two programs below produce the same results. The first example uses the DROP= data set option; the second example uses the DROP statement.

data clinic.stress(drop=timemin timesec);
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   if tolerance='D';
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
run;

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   if tolerance='D';
   drop timemin timesec;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
run;

Assigning Permanent Labels and Formats

At this point, you've read and manipulated your raw data to obtain the observations, variables, and variable values that you want. Your final task in this chapter is to permanently assign labels and formats to variables.

If you completed Chapter 4, "Creating List Reports," on page 99, you practiced temporarily assigning labels and formats within a PROC step. These temporary labels and formats are applicable only for the duration of the step. To permanently assign labels and formats, you use LABEL and FORMAT statements in DATA steps.

Assigning Permanent Labels and Formats

To do this...

Use this type of statement...

Reference a SAS data library

libname clinic 'c:usersmaydata';

Reference an external file

filename tests 'c:users mill.dat';

Name a SAS data set

data clinic.stress;

Identify an external file

    infile tests obs=10;

Describe raw data

    input ID $ 1-4 Name $ 6-25 ...;

Subset data

if resthr<70 then delete;

if tolerance='D';

Drop unwanted variables

drop timemin timesec;

Create or modify a variable

TotalTime=(timemin*60)+timesec;

Initialize a sum variable

retain SumSec 5400;

Sum accumulated values

sumsec+totaltime;

Specify a variable's length

length TestLength $ 6;

Execute statements conditionally

if totaltime>800 then TestLength='Long';

else if 750<=totaltime<=800

     then TestLength='Normal';

else if totaltime<750

     then TestLength='Short';

Label a variable

LABEL statement

Format a variable

FORMAT statement

Execute the DATA step

run

List the data

proc print data=clinic.stress label;

Execute the final program step

run;

Example

To specify the label Cumulative Total Seconds (+5,400) and the format COMMA6. for the variable SumSec, you can submit the following program:

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   if resthr<70 then delete;
   if tolerance='D';
   drop timemin timesec;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
   label sumsec='Cumulative Total Seconds (+5,400)'; 
   format sumsec comma6.;
run;

You're done! When you print the new data set, SumSec is labeled and formatted as specified. (Don't forget the LABEL option in the PROC PRINT statement.)

proc print data=clinic.stress label;
 run;
Example
Example
Example

Assigning Values Conditionally Using SELECT Groups

Earlier in this chapter, you learned to assign values conditionally using IF-THEN/ELSE statements. You can also use SELECT groups in DATA steps to perform conditional processing. A SELECT group contains these statements:

This statement...

Performs this action...

SELECT

begins a SELECT group.

WHEN

identifies SAS statements that are executed when a particular condition is true.

OTHERWISE (optional)

specifies a statement to be executed if no WHEN condition is met.

END

ends a SELECT group.

You can decide whether to use IF-THEN/ELSE statements or SELECT groups based on the following criteria:

  • When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using a series of IF-THEN or IF-THEN/ELSE statements because CPU time is reduced. SELECT groups also make the program easier to read and debug.

  • For programs with few conditions, use IF-THEN/ELSE statements.

Example: Basic SELECT Group

The following code is a simple example of a SELECT group. Notice that the variable a is specified in the SELECT statement, and various values to compare to a are specified in the WHEN statements. When the value of the variable a is

  • 1, x is multiplied by 10

  • 3, 4,or 5, x is multiplied by 100

  • 2 or any other value, nothing happens.

select (a);
   when (1) x=x*10;
   when (3,4,5) x=x*100;
   otherwise;
end

Example: SELECT Group in a DATA Step

Now let's look at a SELECT group in context. In the DATA step below, the SELECT group assign values to the variable Group based on values of the variable JobCode.

Most of the assignments are one-to-one correspondences, but ticket agents (the JobCode values TA1, TA2, and TA3) are grouped together, as are values in the Other category.

data emps(keep=salary group);
   set sasuser.payrollmaster;
   length Group $ 20;
   select(jobcode);
      when ("FA1") group="Flight Attendant I";
      when ("FA2") group="Flight Attendant II";
      when ("FA3") group="Flight Attendant III";
      when ("ME1") group="Mechanic I";
      when ("ME2") group="Mechanic II";
      when ("ME3") group="Mechanic III";
      when ("NA1") group="Navigator I";
      when ("NA2") group="Navigator II";
      when ("NA3") group="Navigator III";
      when ("NA1") group="Navigator I";
      when ("NA2") group="Navigator II";
      when ("NA3") group="Navigator III";
      when ("PT1") group="Pilot I";
      when ("PT2") group="Pilot II";
      when ("PT3") group="Pilot III";
      when ("TA1","TA2","TA3") group="Ticket Agents";
      otherwise group="Other";
    end;
run;

Notice that in this case the SELECT statement does contain a select-expression. You are checking values of a single variable, so here using select(jobcode) and only the JobCode value in each WHEN statement is more concise than eliminating the select-expression and repeating the variable in each when-expression, as in when(jobcode="FA1").

Example: SELECT Group in a DATA Step
Example: SELECT Group in a DATA Step
when ("fa1") group="Flight Attendant I";

In this case, given the SELECT group above, Group would be assigned the value Other.

As you saw in the general form for SELECT groups, you can optionally specify a select-expression in the SELECT statement. The way SAS evaluates a when-expression depends on whether you specify a select-expression.

Specifying SELECT Statements with Expressions

If you do specify a select-expression in the SELECT statement, SAS compares the value of the select-expression with the value of each when-expression. That is, SAS evaluates the select-expression and when-expression, compares the two for equality, and returns a value of true or false.

  • If the comparison is true, SAS executes the statement in the WHEN statement.

  • If the comparison is false, SAS proceeds either to the next when-expression in the current WHEN statement, or to the next WHEN statement if no more expressions are present. If no WHEN statements remain, execution proceeds to the OTHERWISE statement, if one is present.

Specifying SELECT Statements with Expressions

In the following SELECT group, SAS determines the value of toy and compares it to values in each WHEN statement in turn. If a WHEN statement is true compared to the toy value, SAS assigns the related price and continues processing the rest of the DATA step. If none of the comparisons is true, SAS executes the OTHERWISE statement and writes a debugging message to the SAS log.

select (toy);
   when ("Bear") price=35.00;
   when ("Violin") price=139.00;
   when ("Top","Whistle","Duck") price=7.99;
   otherwise put "Check unknown toy: " toy=;
end;

Specifying SELECT Statements without Expressions

If you don't specify a select-expression, SAS evaluates each when-expression to produce a result of true or false.

  • If the result is true, SAS executes the statement in the WHEN statement.

  • If the result is false, SAS proceeds either to the next when-expression in the current WHEN statement, or to the next WHEN statement if no more expressions are present, or to the OTHERWISE statement if one is present. (That is, SAS performs the action that is indicated in the first true WHEN statement.)

If more than one WHEN statement has a true when-expression, only the first WHEN statement is used; once a when-expression is true, no other when-expressions are evaluated.

Specifying SELECT Statements without Expressions

In the example below, the SELECT statement does not specify a select-expression. The WHEN statements are evaluated in order, and only one is used. For example, if the value of toy is Bear and the value of month is FEB, only the second WHEN statement is used, even though the condition in the third WHEN statement is also met. In this case, the variable price is assigned the value 25.00:

select;
   when (toy="Bear" and month in ('OCT', 'NOV', 'DEC')) price=45.00;
   when (toy="Bear" and month in ('JAN', 'FEB')) price=25.00;
   when (toy="Bear") price=35.00;
   otherwise;
end;

Grouping Statements Using DO Groups

So far in this chapter, you've seen examples of conditional processing (IF-THEN/ ELSE statements and SELECT groups) that execute only a single SAS statement when a condition is true. However, you can also execute a group of statements as a unit by using DO groups.

To construct a DO group, you use the DO and END statements along with other SAS statements.

You can use DO groups in IF-THEN/ELSE statements and SELECT groups to perform many statements as part of the conditional action.

Examples

In this simple DO group, the statements between DO and END are performed only when TotalTime is greater than 800. If TotalTime is less than or equal to 800, statements in the DO group do not execute, and the program continues with the assignment statement that follows the appropriate ELSE statement.

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6 Message $ 20;
   if totaltime>800 then
      do; 
         testlength='Long'; 
         message='Run blood panel'; 
      end;
    else if 750<=totaltime<=800 then testlength='Normal';
    else if totaltime<750 then TestLength='Short';
run;

In the SELECT group below, the statements between DO and END are performed only when the value of Payclass is hourly. Notice that an IF-THEN statement appears in the DO group; the PUT statement executes only when Hours is greater than 40. The second END statement in the program closes the SELECT group.

In the SELECT group below, the statements between DO and END are performed only when the value of Payclass is hourly. Notice that an IF-THEN statement appears in the DO group; the PUT statement executes only when Hours is greater than 40. The second END statement in the program closes the SELECT group.

data payroll;
   set salaries;
   select(payclass);
   when ('monthly') amt=salary;
   when ('hourly')
      do; 
         amt=hrlywage*min(hrs,40); 
         if hrs>40 then put 'CHECK TIMECARD'; 
      end;
   otherwise put 'PROBLEM OBSERVATION';
   end;
run;

Indenting and Nesting DO Groups

You can nest DO groups to any level, just like you nest IF-THEN/ELSE statements. (The memory capabilities of your system may limit the number of nested DO statements you can use. For details, see the SAS documentation about how many levels of nested DO statements your system's memory can support.)

The following is an example of nested DO groups:

do;
    statements;
       do;
                statements ;
                   do;
                           statements;
                   end;
       end;
end;

It is good practice to indent the statements in DO groups, as shown in the preceding statements, so that their position indicates the levels of nesting.

Indenting and Nesting DO Groups
  • The iterative DO statement executes statements between DO and END statements repetitively based on the value of an index variable. The iterative DO statement can contain a WHILE or UNTIL clause.

  • The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true, checking the condition after each iteration of the DO loop.

  • The DO WHILE statement executes statements in a DO loop repetitively while a condition is true, checking the condition before each iteration of the DO loop.

You can learn about these forms of the DO statement in Chapter 15, "Generating Data with DO Loops," on page 473.

Chapter Summary

Text Summary

Accumulating Totals

Use a sum statement to add the result of an expression to an accumulator variable.

Initializing Sum Variables

You can use the RETAIN statement to assign an initial value to a variable whose value is assigned by a sum statement.

Assigning Values Conditionally

To perform an action conditionally, use an IF-THEN statement. The IF-THEN statement executes a SAS statement when the condition in the IF clause is true. You can include comparison and logical operators; logical comparisons that are enclosed in parentheses are evaluated as true or false before other expressions are evaluated. Use the ELSE statement to specify an alternative action when the condition in an IF-THEN statement is false.

Specifying Lengths for Variables

When creating a new variable, SAS allocates as many bytes of storage space as there are characters in the first value that it encounters for that variable. This can result in truncated values. You can use the LENGTH statement to specify a length before the variable's first value is referenced in the DATA step.

Subsetting Data

To omit observations as you read raw data, include the DELETE statement in an IF-THEN statement. If you need to read and process variables that you don't want to keep in the data set, use the DROP= and KEEP= data set options or the DROP and KEEP statements.

Assigning Permanent Labels and Formats

You can use LABEL and FORMAT statements in DATA steps to permanently assign labels and formats. These do not affect how data is stored in the data set, only how it appears in output.

Assigning Values Conditionally Using SELECT Groups

As an alternative to IF-THEN/ELSE statements, you can use SELECT groups in DATA steps to perform conditional processing. SELECT groups are more efficient that IF-THEN/ELSE statements when you have a long series of mutually exclusive conditions.

Grouping Statements Using DO Groups

You can execute a group of statements as a unit by DO groups in DATA steps. You can use DO groups in IF-THEN/ELSE statements and SELECT groups to perform many statements as part of the conditional action.

Syntax

LIBNAME libref 'SAS-data-library';

FILENAME fileref 'filename';

DATA SAS-data-set(DROP=variable(s)|KEEP=variable(s));

        INFILE file-specification <OBS=n>;

        INPUT variable < $> startcol-endcol...;

        DROP variable(s);

        KEEP variable(s);

        RETAIN variable initial-value;

        variable+expression;

        LENGTH variable(s) <$> length;

        IF expressionTHEN statement;

        ELSE statement;

        IF expressionTHEN DELETE;

        LABEL variable1='label1' variable2='label2' ...;

        FORMAT variable(s) format-name;

        SELECT <(select-expression)>;

            WHEN-1 (when-expression-1 <..., when-expression-n>) statement;

            WHEN-n (when-expression-1 <..., when-expression-n) statement; >;

            <OTHERWISE statement;>

        END;

RUN;

PROC PRINT DATA=SAS-data setLABEL;

RUN;

Sample Program

data clinic.stress;
   infile tests;
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45;
   if tolerance='D'and resthr ge 70 then delete;
   drop timemin timesec;
   TotalTime=(timemin*60)+timesec;
   retain SumSec 5400;
   sumsec+totaltime;
   length TestLength $ 6;
   if totaltime>800 then testlength='Long';
   else if 750<=totaltime<=800 then testlength='Normal';
   else if totaltime<750 then TestLength='Short';
   label sumsec='Cumulative Total Seconds (+5,400)';
   format sumsec comma6.;
run;

Points to Remember

  • Like the assignment statement, the sum statement does not contain a keyword.

  • If the expression in a sum statement produces a missing value, the sum statement ignores it. (Remember, however, that assignment statements assign a missing value if the expression produces a missing value.)

  • Using ELSE statements with IF-THEN statements can save resources. For greater efficiency, construct your IF-THEN/ELSE statements with conditions of decreasing probability.

  • Make sure the LENGTH statement appears before any other reference to the variable in the DATA step. If the variable has been created by another statement, a later use of the LENGTH statement will not change its size.

  • Labels and formats do not affect how data is stored in the data set, only how it appears in output. You assign labels and formats temporarily in PROC steps and permanently in DATA steps.

Chapter Quiz

Select the best answer for each question. After completing the quiz, you can check your answers using the answer key in the appendix.

  1. Which program creates the output shown below?

    Chapter Quiz
    1. data test2;
         infile furnture;
         input StockNum $ 1-3 Finish $ 5-9 Style $ 11-18
               Item $ 20-24 Price 26-31;
         if finish='oak' then delete;
         retain TotPrice 100;
         totalprice+price;
         drop price;
      run;
      proc print data=test2 noobs;
      run;
    2. data test2;
         infile furnture;
         input StockNum $ 1-3 Finish $ 5-9 Style $ 11-18
               Item $ 20-24 Price 26-31;
         if finish='oak' and price<200 then delete;
         TotalPrice+price;
      run;
      proc print data=test2 noobs;
      run;
    3. data test2(drop=price);
         infile furnture;
         input StockNum $ 1-3 Finish $ 5-9 Style $ 11-18
               Item $ 20-24 Price 26-31;
         if finish='oak' and price<200 then delete;
         TotalPrice+price;
      run;
      proc print data=test2 noobs;
      run;
    4. data test2;
         infile furnture;
         input StockNum $ 1-3 Finish $ 5-9 Style $ 11-18
               Item $ 20-24 Price 26-31;
         if finish=oak and price<200 then delete price;
         TotalPrice+price;
      run;
      proc print data=test2 noobs;
      run;
  2. How is the variable Amount labeled and formatted in the PROC PRINT output?

    data credit;
       infile creddata;
       input Account $ 1-5 Name $ 7-25 Type $ 27
                 Transact $ 29-35 Amount 37-50;
       label amount='Amount of Loan';
       format amount dollar12.2;
    run;
    proc print data=credit label;
       label amount='Total Amount Loaned';
       format amount comma10.;
    run;
    1. label Amount of Loan, format DOLLAR12.2

    2. label Total Amount Loaned, format COMMA10.

    3. label Amount, default format

    4. The PROC PRINT step does not execute because two labels and two formats are assigned to the same variable.

  3. Consider the IF-THEN statement shown below. When the statement is executed, which expression is evaluated first?

    if finlexam>=95
       and (research='A' or
           (project='A' and present='A'))
       then Grade='A+';
    1. finlexam>=95
    2. research='A'
    3. project='A' and present='A'
    4. research='A' or
      (project='A' and present='A')
  4. Consider the small raw data file and program shown below. What is the value of Count after the fourth record is read?

    Chapter Quiz
    1. missing

    2. 0

    3. 30

    4. 70

  5. Now consider the revised program below. What is the value of Count after the third observation is read?

    Chapter Quiz
    1. missing

    2. 0

    3. 100

    4. 130

  6. For the observation shown below, what is the result of the IF-THEN statements?

    Status

    Type

    Count

    Action

    Control

    Ok

    3

    12

    E

    Go

    if status='OK' and type=3
       then Count+1;
    if status='S' or action='E'
       then Control='Stop';
    1. Count = 12  Control = Go
    2. Count = 13  Control =Stop
    3. Count = 12  Control =Stop
    4. Count = 13  Control = Go
  7. Which of the following can determine the length of a new variable?

    1. the length of the variable's first value

    2. the assignment statement

    3. the LENGTH statement

    4. all of the above

  8. Which set of statements is equivalent to the code shown below?

    if code='1' then Type='Fixed';
    if code='2' then Type='Variable';
    if code^='1' and code^='2' then Type='Unknown';
    1. if code='1' then Type='Fixed';
      else if code='2' then Type='Variable';
      else Type='Unknown';
    2. if code='1' then Type='Fixed';
      if code='2' then Type='Variable';
      else Type='Unknown';
    3. if code='1' then type='Fixed';
      else code='2' and type='Variable';
      else type='Unknown';
    4. if code='1' and type='Fixed';
      then code='2' and type='Variable';
      else type='Unknown';
  9. What is the length of the variable Type, as created in the DATA step below?

    data finance.newloan;
       set finance.records;
       TotLoan+payment;
       if code='1' then Type='Fixed';
       else Type='Variable';
       length type $ 10;
    run;
    1. 5

    2. 8

    3. 10

    4. it depends on the first value of Type

  10. Which program contains an error?

    1. data clinic.stress(drop=timemin timesec);
         infile tests;
         input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
               RecHR 35-37 TimeMin 39-40 TimeSec 42-43
               Tolerance $ 45;
         TotalTime=(timemin*60)+timesec;
         SumSec+totaltime;
      run;
    2. proc print data=clinic.stress;
         label totaltime='Total Duration of Test';
         format timemin 5.2;
         drop sumsec;
      run;
    3. proc print data=clinic.stress(keep=totaltime timemin);
         label totaltime='Total Duration of Test';
         format timemin 5.2;
      run;
    4. data clinic.stress;
         infile tests;
         input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
               RecHR 35-37 TimeMin 39-40 TimeSec 42-43
               Tolerance $ 45;
         TotalTime=(timemin*60)+timesec;
         keep id totaltime tolerance;
      run;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset