Defining and Referencing One-Dimensional Arrays

A Brief Overview

Suppose you have a table that contains patient names and five health indicators for each patient. You want to keep a running total of the count of high values for each patient. You can do this by using five IF-THEN statements or DO-loops. The only syntax that differs in each statement is the name of the health indicator column. However, this creates a program that contains repetitive code and is less efficient. Suppose your data set had fifty health indicators instead of five. With fifty indicators, you would need fifty assignment statements. Instead of writing repetitive code, you can use an array.
A SAS array provides a way to reference a group of columns for processing in the DATA step. By grouping columns into an array, you can process the variables in a DO loop. Each column that is grouped together in an array is referred to as an element. You can reference an element in the array by using the array-name and a numeric subscript as shown in the figure below.
Figure 11.1 Referencing a One-Dimensional Array
Referencing a One-Dimensional Array
Arrays are often referenced in DO loops because more than one element in an array must be processed. By using fewer statements in your program, the DATA step program can be more easily modified or corrected. The array name distinguishes it from any other arrays in the same DATA step. The array name is not a variable.
Arrays are used when you want to perform the same task on multiple columns. For example, you can use arrays to process repetitive code, rotate data, and perform table lookups. The array is created at compile time, and it is referenced during execution.

ARRAY Statement Syntax

Using arrays is a two-part process. First you define the array with the ARRAY statement. Second, you reference the array specifying the column that is desired. When you are defining the array, the ARRAY statement includes the ARRAY keyword, the array name, and the number of elements, which are also known as columns. Typically, you list the variables that make up the array. A semicolon is used at the end of the statement.
Syntax, ARRAY statement:
ARRAY array-name [number-of-elements] <array-elements>;
  • array-name specifies the name of the array. The name must be a SAS name that is not the name of a SAS variable in the same DATA step.
  • number-of-elements specifies the number of elements included in the array.
  • array-elements specifies the variables to be included in the array, which must be either be all numeric or all character.
Note: The number of elements must be enclosed in either parentheses ( ), braces { }, or brackets [ ].
CAUTION:
Avoid using the name of a SAS function for an array.
The array will work, but you will not be able to use the function in the same DATA step, and a note appears in the SAS log.

Defining the Number of Elements

The Certadv.Patdata table contains the patient data with health indicators. Suppose you need to define an array named Health and specify the correct number of elements.
Figure 11.2 Referencing One-Dimensional Array
Referencing One-Dimensional Array
The following example illustrates a one-dimensional array named Health. It contains five elements. The columns are defined as Weight, Temp, Pulse, Resp, and BP.
array health[5] Weight Temp Pulse Resp BP;
The number 5 indicates a one-dimensional array with 5 elements and an implied subscript range of 1 to 5. The elements 1 through 5 in the Health array can be referenced by the array name and subscript.
There are several ways to define the number of array elements:
  • You can specify a range of values for the array elements when you define an array. In the following example, the lower bound is 2 and upper bound is 4. Explicitly specifying lower and upper bounds is beneficial if you want to start the lower bound at a value other than 1.
    array health[2:4] Temp Pulse Resp;
  • You can use an asterisk [*] to determine the subscript by counting the variables in the array. When you specify the asterisk, you must include the elements in the ARRAY statement. Using an asterisk enables SAS to determine the size of the array based on the number of elements provided.
    array health[*] Weight Temp Pulse Resp BP;
  • The array elements can be enclosed in parentheses, braces, or brackets. The following three statements are equivalent:
    array health(5) Weight Temp Pulse Resp BP;
    array health{5} Weight Temp Pulse Resp BP;
    array health[5] Weight Temp Pulse Resp BP;

Specifying the Array Elements

Elements in the ARRAY statement are optional unless you are using an asterisk[*] to determine the size of an array. However, if the elements are not specified and they do not exist in the PDV, then variables are created with default names. The default names are created by concatenating the array name that has the subscript.
You can specify the five columns—Weight, Temp, Pulse, Resp, and BP—as your array elements. The array elements can be specified in any order and do not have to be positioned consecutively in the PDV. Weight corresponds with the first element, Temp is the second element, and so on.
array health[5] Weight Temp Pulse Resp BP;
Array elements can be specified using column lists. The double hyphen specifies that all columns will be ordered as they are in the PDV. Since these columns are located consecutively in the PDV, you can refer to the first column, followed by a double hyphen, and then the last column.
array health[5] Weight--BP;

Using Column Lists as Array Elements

You can specify column lists in the forms shown below.
Columns Lists as Array Elements
Column
Description
Form
Numbered range list
Specifies all columns from x1 to xn inclusive. You can begin with any number and end with any number as long as you do not violate the rules for user-supplied column names and the numbers are consecutive.
x1–xn
Name range list
Specifies all columns ordered as they are in the program data vector, from x to b, inclusive.
x- - b
Specifies all numeric columns from x to b, inclusive.
x-numeric-b
Specifies all character columns from x to b, inclusive.
x-character-b
Name prefix list
Specifies all the columns that begin with REV, such as REVJAN, REVFEB, and REVMAR.
REV:
Special SAS name lists
Specifies all numeric columns that are already defined in the current DATA step.
_NUMERIC_
Specifies all character columns that are already defined in the current DATA step.
_CHARACTER_
Specifies all columns that are already defined in the current DATA step.
Note: Variables must be either all numeric or character.
_ALL_

Referencing a One-Dimensional Array

Once an array has been defined, elements within the array can be referenced. To reference an element, specify the name of the array followed by the number of the desired element. The value of element-number is the number of the element desired. The array reference is in the following form:
array-name[element-number]
An array reference enables you to reference a column in a DATA step. For example, if you wanted to reference the third element in the Health array, you would be referencing the Pulse column. The following example references the third element in the Health array.
health[3]
What gives arrays their power is their ability to reference the elements of an array by subscripts. Typically, arrays are used with DO loops to process multiple variables and to perform repetitive calculations.

DO Statement Syntax

An array is typically referenced within a DO loop. You can use the index column to specify which element to reference. The index column changes for each iteration of the DO loop from a start value to a stop value, which can be the number of array elements.
Syntax, DO statement:
DO index-column = 1 TO number-of-elements;
. . . array-name[index-column]. . .
  • index-column is used to reference the element number.
  • number-of-elements refers to the number of elements included in the array.
  • array-name specifies the name of the array. The name must be a SAS name that is not the name of a SAS variable in the same DATA step.

Example: Processing Repetitive Code

Suppose you want to keep a running total of the count of high values for each patient. You can use the Health array in your IF-THEN statement within a DO loop.
data work.highcount;
   set certadv.patdata;
   array health[5] Weight--BP;
   do i = 1 to 5;
      if health[i]='High' then HighCount+1;
   end;
run;
When i is equal to 1, SAS looks at the value of the first element, which is the Weight column. When i is equal to 2, it looks at the second element, which is the Temp column, and so on. SAS iterates over the DO loop five times to reference the five health indicator columns to calculate the HighCount column.
Output 11.1 Work.HighCount Data Set
Work.HighCount Data Set

Handling an Unknown Number of Array Elements

Suppose that you are asked to create a data set where there are a varying number of observations for a variable in the original data set. When you have an unknown number of array elements, use an asterisk (*) within your brackets when defining an array. SAS determines the number of elements by counting the number of columns referenced in the ARRAY statement.
The following example specifies that all columns starting with Ordt will be in the Ordt array, Deldt will be in the Deldt array, and Q will be in the Q array. The colon after the words Ordt, Deldt, and Q specifies that all columns that start with that specific string.
array Ordt[*] Ordt:;
array Deldt[*] Deldt:;
array Q[*] Q:;

Using the DIM Function

DIM Function Syntax

If the number of array elements is unknown, the DIM function can be used to return the number of elements in the array. When using DO loops to process arrays, you can also use the DIM function to specify the value for the TO clause of the iterative DO statement. For an array, specify the array name as the argument for the DIM function.
Syntax, DIM function:
DIM(array-name)
  • array-name specifies the name of the array.

Example: Using the DIM Function in an Iterative DO Statement

When you specify the array name as the single argument for the DIM function, the function returns the number of elements in the array.
data work.sysbp2 (drop=i);
   set work.sysbp;
   array sbparray[*] sbp:;
   do i=1 to dim(sbparray);
      if sbparray[i]=999 then sbparray[i]=.;
end;
run;

Compilation and Execution Phases for Array Processing

Suppose you have survey data Certadv.Salary from Silicon Valley. The data set contains four salary variables identifying the four different people who took the survey at one time. You are asked to create a running total of the number of people whose salaries are less than or equal to the average salary of 51,000.

Compilation Phase

The following program is submitted.
data work.survsalary (drop=i);
   set certadv.salary;
   array BelowAvgS[4] Salary1-Salary4;
   do i=1 to 4;
      if BelowAvgS[i] <=51000 then BelowAvg+1;
   end;
run;
During the compilation phase, the PDV is created. The ARRAY statement is a compile-time statement only. At compile time, SAS reads the ARRAY statement and associates the four salary variables, Salary1 through Salary4, with the BelowAvgS array. The variables Salary1 through Salary4 are already in the PDV because they are existing variables in the input file being read by the SET statement, Certadv.Salary. If the variables were not already in the PDV, then SAS would add the variables to the PDV as new variables.
Program Data Vector
The array name BelowAvgS and array references are not included in the PDV. Syntax errors in the ARRAY statement are detected during the compilation phase.
Note: The array name is not included in the PDV because it is not a variable. It is a name that is used to reference a collection of variables. The array exists only for the duration of the DATA step.

Execution Phase

  1. At the beginning of the execution phase, _N_ is set to 1 in the PDV. BelowAge is initialized to 0 because it is created using the SUM statement.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
    The remaining variables are set to missing.
  2. The SET statement copies the first observation from Certadv.Salary to the PDV.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
  3. The ARRAY statement is a compile-time statement. Therefore, it is ignored in the execution phase.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
  4. In the first iteration of the DO loop, the index variable i is to set to 1.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
    The array reference BelowAvgS[i] becomes BelowAvgS[1]. BelowAvgS[1] refers to the first array element, Salary1. Since Salary1 through Salary4 are higher than 51,000, the BelowAvg column does not update its count to 1. Since all Salary values for this observation are above 51,000, the BelowAvg column remains 0.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    SAS reaches the end of the DO loop. SAS reaches the end of the first iteration of the DATA step, and the implicit OUTPUT statement writes the contents from the PDV to the data set Work.SurvSalary. SAS returns to the beginning of the DATA step.
  5. At the beginning of the second iteration, _N_ increments to 2, and the variables Salary1 through Salary4 retain their values because they are being read from an existing SAS data set. BelowAvg remains 0 because its value is automatically retained.
    Program Data Vector
  6. The SET statement reads the second observation from Certadv.Salary into the PDV.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
  7. On the first iteration of the DO loop, the index variable i is set to 1. The array reference BelowAvgS[i] becomes BelowAvgS[1]. BelowAvgS[1] refers to the first array element, Salary1. Since Salary1 is not less than or equal to 51,000, the BelowAvg column remains 0.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
  8. The DO loop iterates through to the fourth iteration when the index variable i is set to 4. The array reference BelowAvgS[4] refers to the fourth array element, Salary4. Because Salary4 is less than 51,000, the BelowAvg column increments to 1.
    data work.survsalary (drop=i);
       set certadv.salary;
       array BelowAvgS[4] Salary1-Salary4;
       do i=1 to 4;
          if BelowAvgS[i] <=51000 then BelowAvg+1;
       end;
    run;
    Program Data Vector
    SAS reaches the end of the second iteration of the DATA step and the implicit OUTPUT statement writes the contents from the PDV to the data set Work.SurvSalary. SAS returns to the beginning of the DATA step.
The rest of the iterations of the DATA step are processed the same as above.

Graphical Displaying of Array Processing

As the DATA step continues processing, new observations are loaded into the PDV. The DO loop iterates over the four salary variables for each observation, checking to see whether any of those values are below $51,000. When a value below $51,000 is encountered, the BelowAvg variable is incremented by 1.
data work.survsalary (drop=i);
   set certadv.salary;
   array BelowAvgS[4] Salary1-Salary4;
   do i=1 to 4;
      if BelowAvgS[i] <=51000 then BelowAvg+1;
   end;
run;
Output 11.2 PROC PRINT Output of Work.SurvSalary
PROC PRINT Output of Work.SurvSalary
Last updated: October 16, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset