Chapter 5: Creating Labels and Formats

5.1  Adding Labels to Your Variables

5.2  Using Formats to Enhance Your Output

5.3  Regrouping Values Using Formats

5.4  More on Format Ranges

5.5  Storing Your Formats in a Format Library

5.6  Permanent Data Set Attributes

5.7  Accessing a Permanent SAS Data Set with User-Defined Formats

5.8  Displaying Your Format Definitions

5.9  Problems

 

5.1  Adding Labels to Your Variables

If you are using SAS to produce listings and reports for others, you will want to make the output more readable and attractive. SAS formats and labels help you do this. They also help you to remember what each variable represents.

Many SAS procedures use variable labels to improve readability. You can create labels either in a DATA or PROC step. As an example, you can add labels to the variables in the Test_Scores data set like this:

Program 5.1: Adding Labels to Variables in a SAS Data Set

  libname Learn 'C:ookslearning';

  data Learn.Test_Scores;

     length ID $ 3 Name $ 15;

     input ID $ Score1-Score3 Name $;

     label ID = 'Student ID'

           Score1 = 'Math Score'

           Score2 = 'Science Score'

           Score3 = 'English Score';

  datalines;

  1 90 95 98 Jan

  2 78 77 99 Preston

  3 88 91 92 Russell

  ;

  title "Descriptive Statistics for Student Scores";

  proc means data=Learn.Test_Scores;

  run;

Labels are created with a LABEL statement. Following the keyword LABEL, you enter a variable name, followed by an equal sign, followed by your label, placed in single or double quotes. Labels can be up to 256 characters long (255 on UNIX platforms). You may continue with variable names and labels for as many variables as you want. Just make sure that you complete the LABEL statement with a semicolon.

When you run certain SAS procedures, these labels are printed along with the variable names.

Here is output from the program above:

Figure 5.1: Output from Program 5.1

Figure 5.1: Output from Program 5.1

Notice how the labels improve the readability of this output.

If you include your LABEL statement in the DATA step, the labels remain associated with the respective variables; if you include your LABEL statement in a PROC step, the labels are used only for that procedure. This is because the label created in a DATA step is stored in the descriptor portion of the SAS data set.

5.2  Using Formats to Enhance Your Output

SAS provides built-in formats to improve the appearance of printed output. For example, you can print financial data with dollar signs or add commas to large numbers. You saw an example of this in Program 3.8.

You can also create your own user-defined formats. For example, if you have a variable called Gender with values of F and M, you can format these values so that they print as Male and Female. If you have a variable representing age, you can use formats to display the values as age groups instead of actual ages. You can have one format for each variable or use one format for a group of variables.

You create user-defined formats with PROC FORMAT; you associate your formats (or SAS built-in formats) with one or more variables in a FORMAT statement. A SAS data set called Survey shows how formats can be used. Here is a listing of this data set, without any formats:

Figure 5.2: Listing of Data Set Survey

Figure 5.2: Listing of Data Set Survey

Let’s see how formats can improve the readability of this listing:

Program 5.2: Using PROC FORMAT to Create User-Defined Formats

  proc format;

     value $Gender 'M' = 'Male'

                   'F' = 'Female'

                   ' ' = 'Not entered'

                 other = 'Miscoded';

     value Age low-29  = 'Less than 30'

               30-50   = '30 to 50'

               51-high = '51+';

     value $Likert '1' = 'Str Disagree'

                   '2' = 'Disagree'

                   '3' = 'No Opinion'

                   '4' = 'Agree'

                   '5' = 'Str Agree';

  run;

You should notice several things about this procedure. First, you use a VALUE statement to create each user-defined format. Notice that when you are creating the format in a VALUE statement, you do not include a period following the format name.

Next, formats used with character variables start with a dollar sign. Following the format name are either unique values and/or ranges of values, an equal sign, and then the text you want to associate with each value or range of values. Rules concerning format names are the similar to those for SAS variable names with the exception that format names cannot end in a digit and the maximum length of a format name is 31 characters. For those curious readers, the length of 31 instead of 32 comes from the fact that you add a period at the end of a format name when you use it in a FORMAT statement in a DATA or PROC step. (SAS versions prior to SAS 9 allowed only 8 character format names.)

The first format to be defined is $GENDER. Format names do not need to be related to a variable name—calling this format $GENDER makes it easier to remember that you will use it later to alter how the Gender values will be printed in SAS output.

Values for Gender are stored as M and F. Associating the $GENDER format with the variable Gender results in M displaying as Male, F displaying as Female, and missing values displayed as Not entered. The keyword other in the VALUE statement causes the text Miscoded to be printed for any characters besides M, F, or a missing value.

The format AGE is used to group ages into three categories. Notice that it is OK to use the same name for a format and a variable. (SAS knows that a name containing a period is a format.) If you apply this format to the variable Age, the age groups are printed instead of the actual ages. Remember that the internal values of SAS variables are not changed because they have been associated with a format. The format only affects how values print or, in some cases, how SAS procedures process a variable. (For example, PROC FREQ computes frequencies of formatted values rather than raw values; PROC MEANS uses formatted values for variables listed in the CLASS statement, and so forth.)

In the AGE format, the keywords LOW and HIGH refer to the lowest nonmissing value and the highest value, respectively.

Note: The keyword LOW when used with character formats includes missing values.

The last format, $LIKERT, is used to substitute the appropriate text for the digits 1 (strongly disagree) to 5 (strongly agree).

Note: The name Likert was chosen because that is the name that psychometricians use for responses to questions that range from strongly disagree to strongly agree.

Let’s first see what happens if you place a format statement in PROC PRINT, as follows:

Program 5.3: Adding a FORMAT Statement in PROC PRINT

  title "Data Set SURVEY with Formatted Values";

  proc print data=Learn.Survey;

     id ID;

     var Gender Age Salary Ques1-Ques5;

     format Gender      $Gender.

            Age         Age.

            Ques1-Ques5 $Likert.

            Salary      Dollar11.2;

  run;

Here the formats $GENDER and AGE are used to format the variables Gender and Age, respectively. The format $LIKERT formats the five variables Ques1 through Ques5. Notice that each format is followed by a period, just the same as built-in SAS formats.

The format for Salary, DOLLAR11.2, is a SAS format. The name dollar indicates that you want to use the dollar format (which adds a dollar sign and commas to the value); the number 11 tells SAS to print a value using 11 columns; the 2 following the decimal point tells SAS that you want to print two digits to the right of the decimal point. Take note that the 11 columns include the dollar sign, the commas, and the decimal point, in addition to the digits. The largest value for Salary using the DOLLAR11.2 format would be:

$999,999.99

It is a good idea to make the total width a bit larger than you think you need, just in case your data contains a larger number than you expect. SAS has a set of rules that will allow numbers to print when you have not allocated enough columns. However, it is better to ensure you have enough columns and not be concerned with what happens when the format is too small.

Before we show you the output, notice the ID statement in Program 5.3. When you include an ID statement in PROC PRINT, the variable (or variables) you list show up in the first column (or columns) of your report, replacing the Obs column that SAS usually displays in the first column. If you list a variable in an ID statement, don’t also list it in the VAR statement. If you do, it appears twice on the listing. If you have an ID variable such as Subject or ID, it is recommended that you use an ID statement.

Here is the listing:

Figure 5.3: Output from Program 5.3

Figure 5.3: Output from Program 5.3

5.3  Regrouping Values Using Formats

You can use formats to group various values together. For example, suppose you want to see the survey results, but instead of looking at the five possible responses for Questions 1 through 5, you want to group the values 1 and 2 (strongly disagree and disagree) together and the values 4 and 5 (agree and strongly agree) to make three categories for each question. You can accomplish this by creating a new format, as shown in Program 5.4:

Program 5.4: Regrouping Values Using a Format

  proc format;

     value $Three 1,2   = 'Disagreement'

                  3     = 'No opinion'

                  4,5   = 'Agreement';

  run;

 

You can then apply this to the Question variables in a procedure, as follows:

Program 5.5: Applying the New Format to Several Variables with PROC FREQ

  title "Question Frequencies Using the Three Format";

  proc freq data=Learn.Survey;

     tables Ques1-Ques5;

     format Ques1-Ques5 $Three.;

  run;

PROC FREQ, as you saw in Chapter 2, is used to count frequencies for the variables listed in the TABLES statement (Ques1–Ques5 in this case). Because of the FORMAT statement in this procedure, the tables have only three categories rather than the original five. Here is a partial listing of the output:

Figure 5.4: Partial Listing from Program 5.5

Figure 5.4: Partial Listing from Program 5.5

If you look back at Figure 5.2, you can see that the two values of 1 and one value of 2 for Quest1 were combined to give you a frequency of 3 for the category of Disagreement—the two values of 4 and one value of 5 were combined to give you a frequency of 3 for the category of Agreement.

 

5.4  More on Format Ranges

When you define a format, you can specify individual values or ranges to the left of the equal sign in your VALUE statement. As an example of how flexible this approach is, consider that you have a variable called Grade with values of A, B, C, D, F, I, and W. The following VALUE statement creates a format that places these grades into six categories:

value $Gradefmt 'A' – 'C' = 'Passing'

                'D'       = 'Borderline'

                'F'       = 'Failing'

                'I','W'   = 'Incomplete or withdrew'

                ' '       = 'Not recorded'

                other     = 'Miscoded';

 

Here you see that grades A, B, or C will be formatted as Passing, D as Borderline, F as Failing, I or W as Incomplete or withdrew, missing values as Not recorded, and any other value as Miscoded. You may leave the quotes off the character ranges and the labels if you want. However, as a matter of style, we recommend that you use single or double quotes here.

In Program 5.2, the ranges for the AGE format were defined like this:

   value Age low-29  = 'Less than 30'

             30-50   = '30 to 50'

             51-high = '51+';

This is fine if this format is used with integer values. However, suppose you used this format with a variable that could take on values such as 29.5? This value falls between the two ranges low-29 and 30-50. You can make sure there are no gaps in your ranges like this:

   value Age low-<30  = 'Less than 30'

             30-<51   = '30 to less than 51'

             51-high  = '51+';

The first range includes all values Less than 30 (which would include 29.5). The second range includes values from 30 to less than 51 and the last range includes values of 51 and higher.

You can also use a less than (<) sign on the left side of a range.

For the example below, the range 30<-51 does not include 30.  

   value Age low-30   = 'Less than or equal to 30'

             30<-51   = 'Greater than 30 to 51'

             51<-high = 'Greater than 51';

Note: A good way to remember how this works is to exclude the first value, then put the < sign after the value—if you want to exclude the last value, then put the < sign before that value.

If you know that your format may be used with non-integer values, be sure that there are no gaps in your ranges.

5.5  Storing Your Formats in a Format Library

As we mentioned earlier, if you place LABEL and FORMAT statements in the DATA step, the labels and formats become permanently associated with their respective variables. If you have user-defined formats with permanent SAS data sets, it is important to make your formats permanent also. Here are the steps to do this:

1.       Create a library reference (libref) to indicate where you want to store your SAS formats. This can be the same library where you store your data sets.

2.       Use the option LIBRARY=libref when you run PROC FORMAT. (Remember, you only have to run this procedure once.)

As an example, suppose you want to make the formats created in Program 5.2, permanent and save them in the C:ookslearning folder.

Program 5.6 creates a permanent format library for you.

Program 5.6: Creating a Permanent Format Library

  libname Myfmts 'C:ookslearning';

  proc format library=Myfmts;

     value $Gender 'M' = 'Male'

                   'F' = 'Female'

                   ' ' = 'Not entered'

                 other = 'Miscoded';

     value Age low-29  = 'Less than 30'

               30-50   = '30 to 50'

               51-high = '51+';

     value $Likert '1' = 'Strongly disagree'

                   '2' = 'Disagree'

                   '3' = 'No opinion'

                   '4' = 'Agree'

                   '5' = 'Strongly agree';

  run;

If you run this program on a Windows system, a file called formats.sas7bcatwill be created in the folder specified by the libref.

5.6  Permanent Data Set Attributes

If you add your LABEL and FORMAT statements in the DATA step, the labels and formats become permanently associated with their respective variables. This makes for a very convenient way to document a data set. Another user could use PROC CONTENTS or the SAS Explorer to list the labels and formats used with each variable.

Anytime you want to use a SAS data set with associated user-defined formats, you need to tell SAS where to look for these formats. By default, SAS will only look for its own formats, formats in a Work library (i.e., temporary formats), or formats in a library with the special name Library.

If you want SAS to also look in one of your own libraries, you need to issue a FMTSEARCH= system option. You can list one or more libraries for SAS to search using this option. For example, if you want to use the formats you placed in the Myfmts library, you would need to submit the following code:

options fmtsearch=(Myfmts);

If you do this, SAS first looks in the Work library, then the library called Library, and then the Myfmts library. If you want SAS to look in the Myfmts library before it looks in either of the other two libraries, you can name them on the FMTSEARCH statement like this:

options fmtsearch=(myfmts Work Library);

Now, SAS searches the Myfmts library first and then the Work and Library libraries.

Program 5.7 demonstrates how to make a permanent SAS data set with user-defined formats. (For this example, assume you have already created a permanent SAS library in your C:ookslearning folder.)

Program 5.7: Adding LABEL and FORMAT Statements in the DATA Step

  libname Learn 'C:ookslearning';

  libname Myfmts 'C:ookslearning';

  options fmtsearch=(Myfmts);

  data Learn.Survey;

     infile 'C:ookslearningSurvey.txt';

     input ID : $3.

           Gender : $1.

           Age

           Salary

           (Ques1-Ques5)(1.);

     format Gender      $Gender.

            Age         Age.

            Ques1-Ques5 $Likert.

            Salary      Dollar10.0;

     label ID     = 'Subject ID'

           Gender = 'Gender'

           Age    = 'Age as of 1/1/2006'

           Salary = 'Yearly Salary'

           Ques1  = 'The governor is doing a good job?'

           Ques2  = 'The property tax should be lowered'

           Ques3  = 'Guns should be banned'

           Ques4  = 'Expand the Green Acre program'

           Ques5  = 'The school needs to be expanded';

  run;

Now, run PROC CONTENTS on this data set, as follows:

Program 5.8:  Running PROC CONTENTS on a Data Set with Labels and Formats

  title "Data set Survey";

  proc contents data=Learn.Survey varnum;

  run;

You obtain a listing that helps document the data set, like this (partial listing):

Figure 5.5: Output from Program 5.8

Figure 5.5: Output from Program 5.8

You now see the formats and labels associated with each variable.

5.7  Accessing a Permanent SAS Data Set with User-Defined Formats

If you want to use a permanent SAS data set that has user-defined formats, the only requirement is to remember to tell SAS where to find the formats. If you forget the FMTSEARCH= system option, you will get an error message telling you that SAS cannot find the formats.

Note: If you give a copy of a SAS data set with user-defined formats to another user, be sure to also give a copy of the format library to them as well. (On a PC platform, you need to give them a copy of the file formats.sas7bcat.)

Here is an example of a program to compute frequencies on the variables Ques1–Ques5 in the permanent SAS data set Survey:

Program 5.9: Using a User-defined Format

  libname Learn 'C:ookslearning';

  libname Myfmts 'C:ookslearning';

  options fmtsearch=(Myfmts);

 

  title "Using User-defined Formats";

  proc freq data=Learn.Survey;

     tables Ques1-Ques5;

  run;

Once you submit the FMTSEARCH= option, you can use your own formats just as if they were built-in SAS formats.

5.8  Displaying Your Format Definitions

A useful PROC FORMAT option is FMTLIB. This option creates a listing of each format in the specified library with the ranges and labels. As an example, if you want to display the definitions of all the formats in your Myfmts library, you would submit the following code:

Program 5.10: Displaying Format Definitions in a User-created Library

  title "Format Definitions in the Myfmts Library";

  proc format library=Myfmts fmtlib;

  run;

You obtain a table like this:

Figure 5.6: Output from Program 5.10

Figure 5.6: Output from Program 5.10

image shown here

If you only want to see specific formats in a format library, you can add a SELECT statement to your PROC FORMAT. You list the formats you want displayed following the keyword SELECT. When you use a SELECT statement, you do not also have to include the FMTLIB option. For example, to display only the AGE and $LIKERT formats, you could use the following program:

Program 5.11: Demonstrating a SELECT Statement with PROC FORMAT

  proc format library=Myfmts;

     select Age $Likert;

  run;

Note: This program assumes that you have previously submitted a LIBNAME statement defining the Myfmts library.

There is also an EXCLUDE statement that enables you to name the formats you do not want to see displayed.

Please refer to Chapter 22 for more advanced uses of both SAS and user-defined formats.

5.9  Problems

Solutions to odd-numbered problems are located at the back of this book. Solutions to all problems are available to professors. If you are a professor, visit the book’s companion website at support.sas.com/cody for information about how to obtain the solutions to all problems.

1.       Run the program here to create a temporary SAS data set called Voter:

   data Voter;

      input Age Party : $1. (Ques1-Ques4)($1. + 1);

   datalines;

   23 D 1 1 2 2

   45 R 5 5 4 1

   67 D 2 4 3 3

   39 R 4 4 4 4

   19 D 2 1 2 1

   75 D 3 3 2 3

   57 R 4 3 4 4

   ;

Add formats for Age (0–30, 31–50, 51–70, 71+), Party (D = Democrat, R = Republican), and Ques1–Ques4 (1=Strongly Disagree, 2=Disagree, 3=No Opinion, 4=Agree, 5=Strongly Agree). In addition, label Ques1–Ques4 as follows:

Variable

Label

Ques1

The president is doing a good job

Ques2

Congress is doing a good job

Ques3

Taxes are too high

Ques4

Government should cut spending

 

Note: Use PROC PRINT to list the observations in this data set and PROC FREQ to list frequencies for the four questions. (The default action of PROC PRINT is to head each column with a variable name, not the label. To use labels as column headings, use the LABEL option with PROC PRINT.)

2.       You want to see frequencies for Questions 1 to 4 from the previous question. However, you want only three categories: Generally Disagree (combine Strongly Disagree and Disagree), No Opinion, and Generally Agree (combine Agree and Strongly Agree). Accomplish this using a new format for Ques1–Ques4.

3.        Run the following program to create a SAS data set called Colors (see Chapter 21 for a discussion of the double at signs [@@] in the INPUT statement):

   data Colors;

      input Color : $1. @@;

   datalines;

   R R B G Y Y . . B G R B G Y P O O V V B

   ;

Use a format to group the colors as follows:

      R, B, G = Group 1      Y, O = Group 2      Missing = Not Given      All others = Group 3

Use PROC FREQ to list the frequencies of the color groups.

4.       Make a permanent SAS data set from data set Voter in Problem 1. Place this data set in a folder of your choice. Make the labels and formats permanent attributes in this data set and make your formats permanent as well (place them in the same library as the data set). Use the FMTLIB option with PROC FORMAT when you run this procedure.

5.       Write the necessary statements to make three permanent formats in a library of your choice. Use the FMTLIB option to list each of these formats. The formats are defined as follows:

YesNo     1 = Yes, 0 = No
$YesNo    Y = Yes, N = No
$Gender   M = Male, F = Female
Age20yr   low-20 = 1, 21-40 = 2, 41-60 = 3, 61-80 = 4,
                       81-high = 5

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset