Transposing Variables into Observations

The TRANSPOSE Procedure

The Basics of PROC TRANSPOSE

The TRANSPOSE procedure creates an output data set by restructuring the values in a SAS data set, transposing selected variables into observations. By using PROC TRANSPOSE, you can often avoid writing a lengthy DATA step to achieve the same result. Further, the output data set can be used in subsequent DATA or PROC steps for analysis, reporting, or further data manipulation.
PROC TRANSPOSE does not produce printed output. To print the output data set from the PROC TRANSPOSE step, use PROC PRINT.

PROC TRANSPOSE Syntax

To create a transposed variable, the procedure transposes the values of an observation in the input data set into values of a variable in the output data set.
Syntax, PROC TRANSPOSE statement:
PROC TRANSPOSE<DATA=input-data-set> <OUT=output-data-set> <PREFIX=prefix>;
BY <DESCENDING> variable-1
<NOTSORTED>;
ID variable(s);
VAR variable(s);
  • input-data-set names the SAS data set to transpose.
  • output-data-set names the output data set. If output-data-set does not exist, then PROC TRANSPOSE creates it by using the DATAn naming convention. The default is DATAn.
  • prefix specifies a prefix to use in constructing names for transposed variables in the output data set. For example, if PREFIX=VAR, then the names of the variables are VAR1, VAR2, ..., VARn. The default variable name is COLn.
Note: When you use PREFIX= with an ID statement, the variable name begins with the prefix value followed by the ID value.
Tip:You can use name literals (n-literals) for the value of PREFIX. Name literals are helpful when specifying typographical or foreign characters, especially when VALIDVARNAME=ANY. To recall how VALIDVARNAME=ANY interacts with name literals, see VALIDVARNAME=System Option.

PROC TRANSPOSE Results

Output Data Set Variables

The TRANSPOSE procedure always produces an output data set, regardless of whether you specify the OUT= option in the PROC TRANSPOSE statement.
The output data set contains the following variables:
  • variables that result from transposing the values of each variable into an observation.
  • a variable whose values identify the source of the values in each observation in the output data set. This variable is a character variable whose values are the names of the variables that are transposed from the input data set. By default, PROC TRANSPOSE names this variable _NAME_. To override the default name, use the NAME= option. The label for the _NAME_ variable is NAME OF FORMER VARIABLE.
  • a character variable whose values are the variable labels of the variables that are being transposed (if any of the variables that the procedure is transposing have labels). Specify the name of the variable by using the LABEL= option. The default is _LABEL_.
  • variables that PROC TRANSPOSE copies from the input data set when you use either the BY or COPY statement. These variables have the same names and values as they do in the input data set. These variables also have the same attributes (for example: type, length, label, informat, and format).
Note: If the value of the LABEL= option or the NAME= option is the same as a variable that appears in a BY or COPY statement, then the output data set does not contain a variable whose values are the names or labels of the transposed variables. See Student Test Scores in Variables for a sample transposed data set.

Attributes of Transposed Variables

Transposed variables contain the following attributes:
  • All transposed variables are the same type and length.
  • If all variables that the procedure is transposing are numeric, then the transposed variables are numeric. Thus, if the numeric variable has a character string as a formatted value, then its unformatted numeric value is transposed.
  • If any variable that the procedure is transposing is character, then all transposed variables are character. If you are transposing a numeric variable that has a character string as a formatted value, then the formatted value is transposed.
  • The length of the transposed variables is equal to the length of the longest variable that is being transposed.

Example: Performing a Simple Transposition

Here is the Cert.Class data set before any transposition.
Figure 9.4 Cert.Class Data Set
Cert.Class Data Set
This example performs a default transposition and uses no subordinate statements.
proc transpose data=cert.class out=score_transposed;    /*#1*/
run;
proc print data=score_transposed noobs;                 /*#2*/
  title 'Scores for the Year';
run;
1 PROC TRANSPOSE transposes only the numeric variables Score1, Score2, Score3, and Homework. OUT= puts the result of the transposition in the data set Score_Transposed.
2 PROC PRINT prints the Score_Transposed data set. The NOOBS option suppresses the printing of observation numbers.
In the output data set Score_Transposed, the variables COL1 through COL4 contain the values of Score 1, Score 2, Score 3, and Homework for the year for each student. The variable _NAME_ contains the names of the variables from the input data set that were transposed.
Output 9.7 Student Test Scores in Variables
Student Test Scores in Variables

Transposing Specific Variables

Use the VAR statement to list the variables to transpose. If you omit the VAR statement, then the TRANSPOSE procedure transposes all numeric variables in the input data set that are not listed in another statement. You must list character variables in a VAR statement if you want to transpose them.
Note: If the procedure is transposing any character variable, then all transposed variables are character variables.
Syntax, VAR statement:
VAR variable(s);
  • variable(s) names one or more variables to transpose.
Here is the Cert.Trials data set:
Figure 9.5 Cert.Trials Data Set
Cert.Trials data set
proc transpose data=cert.trials out=transtrials1;   /*#1*/
   var Cholesterol Triglyc Uric;                    /*#2*/
run;
proc print data=transtrials1;                       /*#3*/
run;
1 Transpose the data set Cert.Trials and put the results of the transposition in the Transtrials1 data set.
2 The VAR statement specifies the Cholesterol, Triglyc, and Uric variables as the only variables to be transposed.
3 Use the PROC PRINT statement to print the Transtrials1 data set.
In the following output, the variables in the Transtrials1 data set, Cholesterol, Triglyc, and Uric, are the only variables that are transposed. The procedure uses the default variable name COLn.
Output 9.8 PROC PRINT Results: Transtrials1 Data Set
PROC PRINT Results: Transtrials1 Data Set

Naming Transposed Variables

Use the ID statement to create variable names in the output data set that are based on one or more variables from the input data set.
When a variable name is formed in the transposed data set, the formatted values of all listed ID variables are concatenated in the same order that the variables are listed in the ID statement. The PREFIX= option specifies a common character or character string to appear at the beginning of the formed variable names.
Syntax, ID statement:
ID variable(s);
  • variable(s) names one or more variables whose formatted values are used to form the names of the variables in the output data set.
Tip
If the value of any ID variable is missing, then PROC TRANSPOSE writes a warning message to the log. The procedure does not transpose observations that have a missing value for any ID variable.
The following example uses the values of a variable and a user-supplied value to name transposed variables.
proc transpose data=cert.trials out=transtrials2;  /*#1*/
   var cholesterol triglyc uric;                   /*#2*/
   id name testdate;                               /*#3*/
run;
proc print data=transtrials2;                      /*#4*/
run;
1 Transpose the data set Cert.Trials and put the results of the transposition in the Transtrials2 data set.
2 The VAR statement specifies the Cholesterol, Triglyc, and Uric variables as the only variables to be transposed.
3 The ID statement specifies Name and TestDate as the variables whose nonmissing formatted values name the transposed variables in the output data set, Transtrials2. Because the ID statement specifies two variables, the values of those variables are concatenated to create the new variable names.
4 Use the PROC PRINT statement to print the Transtrials2 data set.
Output 9.9 PROC PRINT Results: Transtrials2 Data Set
Transtrials2 Data Set

Transposing BY Groups

Use the BY statement in the PROC TRANSPOSE step to define and transpose BY groups.
Syntax, BY statement:
BY <DESCENDING> variable-1
<NOTSORTED>;
  • variable specifies the variable that PROC TRANSPOSE uses to form BY groups. You can specify more than one variable. If you do not use the NOTSORTED option in the BY statement, then either the observations must be sorted by all the variables that you specify, or they must be indexed appropriately. Variables in a BY statement are called BY variables.
  • DESCENDING specifies that the input data set is sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.
  • NOTSORTED specifies that observations are not necessarily sorted in alphabetic or numeric order.
    The requirement for ordering or indexing observations according to the values of BY variables is suspended for BY-group processing when you use the NOTSORTED option. The procedure does not use an index if you specify NOTSORTED. The procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same values for the BY variables are not contiguous, then the procedure treats each contiguous set as a separate BY group.
proc transpose data=cert.trials out=transtrials3;    /*#1*/
   var cholesterol triglyc uric;                     /*#2*/
   id name;                                          /*#3*/
   by testdate;                                      /*#4*/
run;
proc print data=transtrials3;                          /*#5*/
run;
1 Transpose the data set. The OUT= option puts the results of the transposition in the Transtrials3 data set.
2 The VAR statement specifies the Cholesterol, Triglyc, and Uric variables as the only variables to be transposed.
3 The ID statement specifies Name as the variable whose nonmissing formatted values name the transposed variables in the output data set, Transtrials3.
4 The BY statement creates BY groups for each unique TestDate. The procedure does not transpose the BY variables.
5 Use the PROC PRINT statement to print the Transtrials3 data set.
The following data set is the output data set, Transtrials3. For each BY group in the original data set, PROC TRANSPOSE creates three observations, one for each variable that it is transposing.
Output 9.10 PROC PRINT Results: Transtrials3 Data
Transtrials3 Data
Last updated: February 14, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset