Chapter 2 Creating SAS Data Sets

Your first step in using SAS to summarize and analyze data is to create a SAS data set. If your data is in a spreadsheet or other type of file, see Chapter 3, “Importing Data.” This chapter discusses the following topics:

  • understanding features of SAS data sets
  • creating a SAS data set
  • printing a data set
  • sorting a data set

The “Special Topics” section discusses the following tasks:

  • adding labels to variable names
  • adding labels to values of variables

What Is a SAS Data Set?

Understanding the SAS DATA Step

Summarizing the SAS DATA Step

Assigning Names

Task 1: The DATA Statement

Task 2: The INPUT Statement

Task 3: The DATALINES Statement

Task 4: The Data Lines

Task 5: The Null Statement

Task 6: The RUN Statement

Creating the Speeding Ticket Data Set

Printing a Data Set

Printing Only Some of the Variables

Suppressing the Observation Number

Adding Blank Lines between Observations

Summarizing PROC PRINT

Sorting a Data Set

Summary

Key Ideas

Syntax

Example

Special Topics

Labeling Variables

Formatting Values of Variables

Combining Labeling and Formatting

Syntax

What Is a SAS Data Set?

Most of us work with data every day—in lists or tables or spreadsheets—on paper or on a computer. A data set is how SAS stores data in rows and columns.

Table 2.1 shows the speeding ticket fine amounts for the first speeding offense on a highway, up to 24 miles per hour over the posted speed limit.[1] The next section shows how to create a SAS data set using this data as an example.

For convenience, the speeding ticket data is available in the tickets data set in the sample data for this book.

Understanding the SAS DATA Step

A SAS data set has some specific characteristics and rules that must be followed so that the data can be analyzed using SAS software. Fortunately, there are only a few simple rules to remember. Recall that the SAS language is divided into DATA and PROC steps. The process of creating a SAS data set is the DATA step. Once you create a SAS data set, it can be used with any SAS procedure.

The speeding ticket data, like most data, consists of pieces of information that are recorded for each of several units. In this case, the pieces of information include the name of the state and the amount of the speeding ticket fine. You could think of a spreadsheet with two columns for the names of the states and the amounts of the speeding ticket fines. Each row contains all of the information for a state. When you use SAS, the columns are called variables, and the rows are called observations. In general, variables are the specific pieces of information you have, and observations are the units you have measured.

Table 2.1 contains 50 observations (one for each state), and two variables, the name of the state and the speeding ticket fine amount for the state. For example, the first observation has the value “Alabama” for the variable “state,” and the value “100” for the variable “amount.”

Table 2.1 Speeding Ticket Fines

Summarizing the SAS DATA Step

Table 2.2 uses an example to summarize the parts of a SAS DATA step. Before you begin these tasks, think about names for your data set and variables. See the next section, “Assigning Names,” for details.

Table 2.2 Summarizing Parts of a SAS DATA Step

The only output produced by this DATA step is a note in the log. This note tells you that the data set has been created. It gives the name of the data set, the number of variables, and the number of observations.

The next seven sections give details on the DATA step. The rest of the chapters in the book assume that you understand how to create data sets in SAS.

Assigning Names

Before you can name the data set or name variables, you need to choose names that are acceptable to SAS. Depending on the version of SAS, different rules might apply to names. Table 2.3 summarizes the rules for SAS names for data sets or variables (for all SAS versions and releases after SAS 7).

Table 2.3 Rules for SAS Data Set or Variable Names

This book assumes the automatic option of VALIDVARNAME=V7, which presumes you are using SAS 7 or later. For your own data, you can use mixed case text, uppercase text, or lowercase text.

You can change the automatic option in the OPTIONS statement. However, you might want to use caution. If you need to share your SAS programs with others at your site, and you change the option, they might not be able to run your programs.

If your site uses VALIDVARNAME=UPCASE, then the same naming rules apply, with one exception. Variable names must be in uppercase with this setting. This setting is compatible with earlier versions of SAS.

If your site uses VALIDVARNAME=ANY, then some of the naming rules are not needed. With this setting, names can contain embedded blanks. Also, with this setting, names can begin with any character, including blanks. If you use the ampersand (&) or percent (%) sign, then review the SAS documentation because special coding is needed for SAS to understand names with these characters.

Task 1: The DATA Statement

The DATA statement starts the DATA step and assigns a name to the data set. The simplest form for the DATA statement is shown below:

DATA data-set-name;

data-set-name is the name of the data set.

If you do not provide a data-set-name, SAS chooses a name for you. SAS chooses the automatic names DATA1, DATA2, and so on. This book always uses names for data sets.

Task 2: The INPUT Statement

The next statement in the DATA step is the INPUT statement. The statement provides the following information to SAS:

  • what to use as variable names
  • whether the variables contain numbers or characters
  • the columns where each variable is located in the lines of data

SAS can create data sets for virtually any data, and the complete SAS documentation on the INPUT statement discusses many features and options. This chapter discusses the basic features. If your data is more complex, then check the documentation for more features. Table 2.4 shows the basics of writing an INPUT statement, using the data in Table 2.1 as an example.

Table 2.4 Writing an INPUT Statement

Table 2.4 Writing an INPUT Statement (continued)

Identifying Missing Values

If you do not have the value of a variable for an observation, you can simply leave the data blank. However, this approach often leads to errors in DATA steps. For numeric variables, enter a period for missing values instead. For example, the speeding ticket data has a missing value for Washington, D.C. When providing the data, use a period for this value. For character variables, when you specify the column location, you can continue to leave the data blank. As discussed below, when you omit the column location, or place several observations on a single line, enter a period for a missing character value.

Omitting the Column Location for Variables

In some cases, you can use a simple form of the INPUT statement, and skip the activity of specifying columns for variables. Using the speeding ticket data as an example, here is the simple form:

input state $ amount;

To use this simple form, the data must satisfy the following conditions:

  • Each value on the data line is separated from the next value by at least one blank.
  • Any missing values are represented by periods instead of blanks. This is true for both character and numeric variables.
  • Values for all character variables are eight characters or less, and they don’t contain any embedded blanks.

When these conditions exist, you can do the following:

  • List variable names in the INPUT statement in the order in which they appear in the data lines.
  • Follow character variable names with a dollar sign.
  • End the INPUT statement with a semicolon.

Putting Several Short Observations on One Line

You might have data similar to the speeding ticket data, with only a few variables for each observation. In this situation, you might want to put several observations on a single data line. To use this approach, consider the following:

  • Your data must satisfy the three conditions for omitting column locations for variables.
  • You can put several observations on a data line and different numbers of observations on different data lines.
  • You can use the INPUT statement without column locations and type two at signs (@@) just before the semicolon.

For the speeding ticket data, here is this form of the INPUT statement:

input state $ amount @@;

SAS documentation refers to @@ as a “double trailing at sign.” The @@ tells SAS to continue reading data from the same line, instead of moving to the next line. Without @@, SAS assumes that one observation appears on each data line.

Task 3: The DATALINES Statement

The DATALINES statement is placed immediately before the lines of data. This simple statement has no options.

DATALINES;

In some SAS programs, you might see a CARDS statement instead of the DATALINES statement. The two statements perform the same task.

Existing Data Lines: Using the INFILE Statement

Suppose your data already exists in a text file. You don’t have to re-enter the data lines. Instead, you can access the data lines by using the INFILE statement instead of the DATALINES statement. For more information, see Chapter 3.

Task 4: The Data Lines

After the DATALINES statement, enter the data lines according to the INPUT statement. Do not put a blank line between the DATALINES statement and the data. If you specify certain columns for variables, enter the values of those variables in the correct columns. If you don’t specify columns in the INPUT statement, put a period in the place where a missing value occurs.

Task 5: The Null Statement

After the last data line, enter a semicolon, alone, on a new line. This is called a null statement, and it ends the data lines. The null statement must be alone and on a different line from the data.

Task 6: The RUN Statement

Chapter 1 introduced the RUN statement, and recommended using it at the end of each DATA or PROC step. This approach helps identify where one step ends and another step begins. And, it can find errors in programs. This book follows the approach of ending each DATA step with a RUN statement.

Creating the Speeding Ticket Data Set

This section uses the DATA, INPUT, and DATALINES statements to create a data set for the speeding ticket data. This example uses the simplest form of the INPUT statement, omitting column locations and putting several observations on each data line. The values of state are the two-letter codes used by the U.S. Postal Service.

data tickets;

input state $ amount @@;

datalines;

AL 100 HI 200 DE 20 IL 1000 AK 300 CT 50 AR 100 IA 60 FL 250

KS 90 AZ 250 IN 500 CA 100 LA 175 GA 150 MT 70 ID 100 KY 55

CO 100 ME 50 NE 200 MA 85 MD 500 NV 1000 MO 500 MI 40 NM 100

NJ 200 MN 300 NY 300 NC 1000 MS 100 ND 55 OH 100 NH 1000 OR 600 OK 75 SC 75 RI 210 PA 46.50 TN 50 SD 200 TX 200 VT 1000 UT 750 WV 100 VA 200 WY 200 WA 77 WI 300 DC .

;

run;

The statements above use the name tickets to identify the data set, the name state to identify the state, and the name amount to identify the dollar amount of the speeding ticket fine.

Look at the last data line of the program above. The missing value for Washington, D.C., is identified using a period.

Printing a Data Set

Chapter 1 describes SAS as consisting of DATA steps and PROC steps. This section shows how to use PROC PRINT to print the data. To print the speeding ticket data, add the following statements to the end of the program that creates the data set:

proc print data=tickets;

title 'Speeding Ticket Data';

run;

Figure 2.1 shows the results.

Figure 2.1 shows the title at the top that was requested in the TITLE statement. The first column labeled Obs shows the observation number, which you can think of as the order of the observation in the data set. For example, Alabama is the first observation in the speeding ticket data set, and the Obs for AL is 1. The second column is labeled state. This column gives the values for the variable state. The third column is labeled amount, and it gives the values for the variable amount.

As discussed earlier in “Assigning Names,” SAS allows mixed case variable names. When you use all lowercase, SAS displays lowercase in the output.

In general, when you print a data set, the first column shows the observation number and is labeled Obs. Other columns are labeled with the variable names provided in the INPUT statement. PROC PRINT automatically prints the variables in the order in which they were listed in the INPUT statement.

Figure 2.1 PROC PRINT Results

Printing Only Some of the Variables

If you have a data set with many variables, you might not want to print all of them. Or, you might want to print the variables in a different order than they are in in the data lines. In these situations, use a VAR statement with PROC PRINT.

Suppose you conducted a survey that asked opinions about nuclear power plants and collected some demographic information (age, sex, race, and income). Suppose the data set is named NUCPOWER and you want to print only the variables AGE and INCOME.

proc print data=NUCPOWER;

var AGE INCOME;

run;

PROC PRINT results show only three columns: Obs, AGE, and INCOME.

Suppressing the Observation Number

Although the observation number is useful in checking your data, you might want to suppress it when printing data in reports. Use the NOOBS option in the PROC PRINT statement as shown below:

proc print data=tickets noobs;

title 'Speeding Ticket Variables';

run;

Figure 2.2 shows the first few lines of the output.

Figure 2.2 Suppressing the Observation Number

Figure 2.2 contains only two columns, one for each variable in the data set.

Adding Blank Lines between Observations

For easier reading, you might want to add a blank line between each observation in the output. Use the DOUBLE option in the PROC PRINT statement as shown below:

proc print data=tickets double;

title 'Double-spacing for Speeding Ticket Data';

run;

Figure 2.3 shows the first few lines of the output.

Figure 2.3 Adding Double-Spacing to Output

Figure 2.3 shows a blank line between each observation.

Summarizing PROC PRINT

The general form of the statements to print a data set is shown below:

PROC PRINT DATA=data-set-name options;

VARvariables;

data-set-name is the name of a SAS data set, and variables lists one or more variables in the data set.

The PROC PRINT statement options can be one or more of the following:

NOOBS

suppresses the observation numbers.

DOUBLE

adds a blank line between each observation.

The VAR statement is optional. SAS automatically prints all the variables in the data set.

Sorting a Data Set

PROC PRINT prints the data in the order in which the values were entered. This might be what you need or it might not. If you want to reorder the data, use PROC SORT. This procedure requires a BY statement that specifies the variables to use to sort the data.

Suppose you want to see the speeding ticket data sorted by amount. In other words, you want to see the observation with the smallest amount first, and the observation with the largest amount last.

proc sort data=tickets;

by amount;

run;

proc print data=tickets;

title 'Speeding Ticket Data: Sorted by Amount';

run;

Figure 2.4 shows the first few lines of the output.

Figure 2.4 shows the observation for DC first. When sorting, missing values are the “lowest” values and appear first. The next several observations show the data sorted by amount.

Figure 2.4 Sorting in Ascending Order

SAS automatically sorts data in English. If you need your data sorted in another language, check the SAS documentation for possible options.

SAS automatically sorts data in ascending order (lowest values first). SAS automatically orders numeric variables from low to high. SAS automatically orders character values in alphabetical order.

In many cases, sorting the data in descending order makes more sense. Use the DESCENDING option in the BY statement.

proc sort data=tickets;

by descending amount;

run;

proc print data=tickets;

title 'Speeding Ticket Data by Descending Amount';

run;

Figure 2.5 shows the first few lines of the output.

Figure 2.5 shows the observation for Illinois (IL) first. The observation for DC, which has a missing value for amount, appears last in this output.

Figure 2.5 Sorting in Descending Order

You can sort by multiple variables, sort some in ascending order, and sort others in descending order. Returning to the example of the survey on nuclear power plants, consider the following:

proc sort data=NUCPOWER;

by descending AGE INCOME;

run;

proc print data=NUCPOWER;

var AGE INCOME Q1;

run;

The statements above show results for the first question in the survey. The results are sorted in descending order by AGE, and then in ascending order by INCOME.

The general form of the statements to sort a data set is shown below:

PROC SORT DATA=data-set-name;

BY DESCENDINGvariables;

data-set-name is the name of a SAS data set, and variables lists one or more variables in the data set.

The DESCENDING option is not required. If it is used, place the option before the variable that you want to sort in descending order.

Summary

Key Ideas

  • SAS stores data in a data set, which is created by a DATA step. The rows of the data set are called observations, and the columns are called variables.
  • SAS names must follow a few simple rules. Think about appropriate names before you create a data set.
  • In a SAS DATA step, use the DATA statement to give the data set a name. Use the INPUT statement to assign variable names and describe the data. Then, use the DATALINES statement to include the data lines, and end with a semicolon on a line by itself.
  • Use PROC PRINT to print data sets. Using options, you can suppress observation numbers and add a blank line between each observation.
  • Use PROC SORT to sort data sets by one or more variables. Using the DESCENDING option, you can change the automatic behavior of sorting in ascending order.

Syntax

To create a SAS data set using a DATA step

DATA data-set-name;

data-set-name is the name of a SAS data set.

Use one of these INPUT statements:

INPUT variable $ location . . . ;

INPUT variable $ . . . ;

INPUT variable $ . . . @@ ;

variable

is a variable name.

$

is used for character variables and omitted for numeric variables.

location

gives the starting and ending columns for the variable.

@@

looks for multiple observations on each line of data.

DATALINES;

data lines

contain the data.

;

ends the lines of data.

To print a SAS data set

PROC PRINT DATA= data-set-name options;

VAR variables;

data-set-name

is the name of a SAS data set.

variables

lists one or more variables in the data set.

The PROC PRINT statement options can be one or more of the following:

NOOBS

suppresses the observation numbers.

DOUBLE

adds a blank line between each observation.

The VAR statement is optional. SAS automatically prints all the variables in the data set.

To sort a SAS data set

PROC SORT DATA=data-set-name;

BY DESCENDING variables;

data-set-name

is the name of a SAS data set.

variables

lists one or more variables in the data set.

The DESCENDING option is not required. If it is used, place the option before the variable that you want to sort in descending order.

Example

The program below produces all of the output shown in this chapter:

options nodate nonumber ps=60 ls=80;

data tickets;

input state $ amount @@;

datalines;

AL 100 HI 200 DE 20 IL 1000 AK 300 CT 50 AR 100 IA 60 FL 250

KS 90 AZ 250 IN 500 CA 100 LA 175 GA 150 MT 70 ID 100 KY 55

CO 100 ME 50 NE 200 MA 85 MD 500 NV 1000 MO 500 MI 40 NM 100

NJ 200 MN 300 NY 300 NC 1000 MS 100 ND 55 OH 100 NH 1000 OR 600 OK 75 SC 75 RI 210 PA 46.50 TN 50 SD 200 TX 200 VT 1000 UT 750 WV 100 VA 200 WY 200 WA 77 WI 300 DC .

;

run;

proc print data=tickets;

title 'Speeding Ticket Data';

run;

proc print data=tickets noobs;

title 'Speeding Ticket Variables';

run;

proc print data=tickets double;

title 'Double-spacing for Speeding Ticket Data';

run;

proc sort data=tickets;

by amount;

run;

proc print data=tickets;

title 'Speeding Ticket Data: Sorted by Amount';

run;

proc sort data=tickets;

by descending amount;

run;

proc print data=tickets;

title 'Speeding Ticket Data by Descending Amount';

run;

Special Topics

This section shows how to add labels to variables and to values of variables. SAS refers to adding labels to values of variables as formatting the variable or adding formats.

SAS provides ways to add labels and formats in the PROC step. However, the simplest approach is to add labels and formats in the DATA step. Then, SAS automatically uses the labels and formats in most procedures.

Labeling Variables

Many programmers prefer short variable names. However, these short names might not be descriptive enough for printed results. Use a LABEL statement in the DATA step to add labels. Place the LABEL statement between the INPUT and DATALINES statements.

data ticketsl;

input state $ amount @@;

label state='State Where Ticket Received'

amount='Cost of Ticket';

datalines;

Follow these statements with the lines of data, a null statement, and a RUN statement.

The LABEL statement is similar to the TITLE and FOOTNOTE statements, which Chapter 1 discusses. Enclose the text for the label in quotation marks. If the label itself contains a single quotation mark, enclose the text for the label in double quotation marks. For example, the statements below are valid:

label cost09 = 'Cost from Producer in 2009';

label month1 = "Current Month’s Results";

To see the effect of adding the LABEL statement, use PROC PRINT. Many SAS procedures automatically add labels. For PROC PRINT, use the LABEL option in the PROC PRINT statement. Here is what you would type for the speeding ticket data:

proc print data=ticketsl label;

title 'Speeding Ticket Data with Labels';

run;

Figure ST2.1 shows the first few lines of the output.

Figure ST2.1 Adding Labels to Variables

Figure ST2.1 shows the labels instead of the variable names.

Formatting Values of Variables

SAS provides multiple ways to add formats to variables. Here are three approaches:

  • Assign an existing SAS format to the variable. SAS includes dozens of formats, so you can avoid the effort of creating your own format in many cases.
  • Use a SAS function to assign a format. This approach creates a new variable that contains the values you want. SAS includes hundreds of functions, so you can save time and effort in many cases.
  • Create your own SAS format, and assign it to the variable.

The next three topics discuss these approaches.

Using an Existing SAS Format

SAS includes many existing formats. If you want to add a format to a variable, check first to see whether an existing format meets your needs. For more information about existing formats, see the SAS documentation or review the online Help.

For the speeding ticket data, suppose you want to format amount to show dollars and cents. With the DOLLAR format, you specify the total length of the variable and the number of places after the decimal point. For amount, the FORMAT statement below specifies a length of 8 with 2 characters after the decimal point:

format amount dollar8.2;

You can assign existing SAS formats in a DATA step, which formats the values for all procedures. Or, you can assign existing SAS formats in a PROC step, which formats the values for only that procedure. The statements below format amount for all procedures:

data ticketsf1;

input state $ amount @@;

format amount dollar8.2;

datalines;

Follow these statements with the lines of data, a null statement, and a RUN statement to complete the DATA step.

You can assign formats to several variables in one FORMAT statement. For example, the statement below assigns the DOLLAR10.2 format to three variables:

format salary dollar10.2 bonus dollar10.2 sales dollar10.2;

The statement can be used either in a DATA step or a PROC step.

The statements below format amount for PROC PRINT only, and use the data set created earlier in this chapter:

proc print data=tickets;

format amount dollar8.2;

title 'Speeding Ticket Data Using DOLLAR Format';

run;

Figure ST2.2 shows the first few lines of the output from the PROC PRINT step.

Figure ST2.2 Using Existing SAS Formats

Figure ST2.2 shows the formatted values for the speeding ticket fines.

Table ST2.1 summarizes a few existing SAS formats for numeric variables.

Table ST2.1 Commonly Used SAS Formats

Using a SAS Function

SAS includes many functions, which you can use to create new variables that contain the formatted values you want. As with existing SAS formats, you might want to check the list of SAS functions before creating your own format.

For the speeding ticket data, suppose you want to use the full state names instead of the two-letter codes. After checking SAS Help, you find that the STNAMEL function meets your needs.

data ticketsf2;

input state $ amount @@;

statetext = stnamel(state);

datalines;

Follow these statements with the lines of data, a null statement, and a RUN statement to complete the DATA step.

The statements below print the resulting data:

proc print data=ticketsf2;

title 'Speeding Ticket Data with STNAMEL Function';

run;

Figure ST2.3 shows the first few lines of the output from the PROC PRINT step.

Figure ST2.3 Using Existing SAS Functions

Figure ST2.3 shows all three variables. The INPUT statement creates the state and amount variables. The STNAMEL function creates the statetext variable.

Table ST2.2 summarizes a few SAS functions.

Table ST2.2 Commonly Used SAS Formats

Creating Your Own Formats

Sometimes, neither an existing SAS format nor SAS function meets your needs. Chapter 1 introduced the body fat data, where gender has the values of m and f for males and females. Suppose you want to use full text values for this variable. To do so, you can use PROC FORMAT to create your own format. After creating the format, you can assign it to a variable in the DATA step. Here is what you would type for the body fat data:

proc format;

value $sex 'm'='Male' 'f'='Female';

run;

data bodyfat2;

input gender $ fatpct @@;

format gender $sex.;

datalines;

m 13.3 f 22 m 19 f 26 m 20 f 16 m 8 f 12 m 18 f 21.7

m 22 f 23.2 m 20 f 21 m 31 f 28 m 21 f 30 m 12 f 23

m 16 m 12 m 24

;

run;

proc print data=bodyfat2;

title 'Body Fat Data for Fitness Program';

run;

PROC FORMAT creates a new format.

The VALUE statement gives a name to the format. For the example above, the new format is named $SEX. Formats for character values must begin with a dollar sign.

The VALUE statement also specifies the format you want for each data value. For character values, enclose both the format and the value in quotation marks. Both the format and the value are case sensitive. In the example above, suppose the specified value was M. When SAS applied the $SEX format to the data, no matches would be found.

When you use the format in the DATA step, follow the name of the format with a period. For the example above, the FORMAT statement in the DATA step specifies $SEX. for the GENDER variable.

Figure ST2.4 shows the results. Compare Figure ST2.4 and Figure 1.2. The values for GENDER use the new format.

Figure ST2.4 Creating a New Format

The steps for creating your own format are summarized below:

1. Confirm that you cannot use an existing SAS format or SAS function.

2. Use PROC FORMAT to create a new format. The VALUE statement gives a name to the format and specifies formats for values. You can create multiple formats in one PROC FORMAT step. Use a VALUE statement to define each new format.

3. After using PROC FORMAT, assign the format to a variable in the DATA step by using the FORMAT statement. All SAS procedures use the format whenever possible. Or, you can specify the format with each procedure.

Combining Labeling and Formatting

You can combine labeling and formatting. You can use the LABEL and FORMAT statements in any order, as long as they both appear before the DATALINES statement. Similarly, you can create a new variable with a SAS function as long as this statement appears before the DATALINES statement. The list below summarizes possible approaches:

  • Assign labels in the DATA step with the LABEL statement. Many SAS procedures automatically use the labels, but PROC PRINT requires the LABEL option to use the labels.
  • If you use an existing SAS format in a DATA step, then SAS procedures automatically use the format.
  • If you use an existing SAS format for a specific procedure, then only that procedure uses the format.
  • If you use a SAS function in a DATA step, then SAS procedures display the values of the new variable. You can assign a label to the new variable in the LABEL statement.
  • If you create your own format, and assign the format to a variable in a DATA step, then SAS procedures automatically use the format.
  • If you create your own format, you can assign it for a specific procedure, and then only that procedure uses the format.

For the speeding ticket data, the statements below combine labeling, using a SAS function, and using an existing SAS format:

data tickets2;

input state $ amount @@;

format amount dollar8.2;

label state='State Code'

statetext='State Where Ticket Received'

amount='Cost of Ticket';

statetext = stnamel(state);

datalines;

AL 100 HI 200 DE 20 IL 1000 AK 300 CT 50 AR 100 IA 60 FL 250

KS 90 AZ 250 IN 500 CA 100 LA 175 GA 150 MT 70 ID 100 KY 55

CO 100 ME 50 NE 200 MA 85 MD 500 NV 1000 MO 500 MI 40 NM 100

NJ 200 MN 300 NY 300 NC 1000 MS 100 ND 55 OH 100 NH 1000 OR 600 OK 75 SC 75 RI 210 PA 46.50 TN 50 SD 200 TX 200 VT 1000 UT 750 WV 100 VA 200 WY 200 WA 77 WI 300 DC .

;

run;

proc print data=tickets2 label;

title 'Speeding Ticket Data with Labels and Formats';

run;

Figure ST2.5 shows the first few lines of the output.

Figure ST2.5 Combining Labeling and Formatting

Syntax

To label variables in a SAS data set

DATA statement that specifies a data set name

INPUT statement appropriate for your data

LABEL variable='label';

DATALINES;

data lines

;

RUN;

PROC PRINT DATA=data-set-name LABEL;

data-set-name

is the name of a SAS data set.

variable

is the variable you want to label.

label

is the label you want for the variable. The label must be enclosed in single quotation marks and can be up to 256 characters long. (Blanks count as characters.) You can associate labels with several variables in one LABEL statement.

For your data, use DATA and INPUT statements that describe the data lines, and end each statement with a semicolon.

To use existing formats in a SAS data set

DATA statement that specifies a data set name

INPUT statement appropriate for your data

FORMAT variable format-name;

DATALINES;

data lines

;

RUN;

variable

is the variable you want to format.

format-name

is an existing SAS format. You can associate formats with several variables in one FORMAT statement.

To use existing formats in a SAS procedure

Many SAS procedures can use a FORMAT statement and apply a format to a variable for only that procedure. As an example, to use an existing SAS format and print the data set,

PROC PRINT DATA=data-set-name;

FORMAT variable format-name;

To create new variables with SAS functions in a SAS data set

Use functions in a DATA step to create new variables that contain the values you want.

DATA statement that specifies a data set name

INPUT statement appropriate for your data

new-variable=function(existing-variable);

DATALINES;

data lines

;

RUN;

new-variable

is the new variable.

function

is a SAS function.

existing-variable

is an appropriate variable in the INPUT statement.

The parentheses and the semicolon are required.

To create a new format and apply it in a SAS data set

PROC FORMAT;

VALUE format-name value=format

value=format

.

.

.

value=format;

RUN;

DATA

INPUT

FORMAT variable format-name. ;

DATALINES;

data lines

;

RUN;

format-name

is a SAS name you assign to the list of formats.

If the variable has character values, format-name must begin with a dollar sign ($). Also, if the variable has character values, the format-name can be up to 31 characters long, and cannot end in a number. (Blanks count as characters.)

If the variable has numeric values, format-name can be up to 32 characters long, and can end in a number.

The format-name cannot be the name of an existing SAS format.

value

is the value of the variable you want to format. If the value contains letters or blanks, enclose it in single quotation marks.

format

is the format you want to attach to the value of the variable. If the value contains letters or blanks, enclose it in single quotation marks.

Some SAS procedures display only the first 8 or the first 16 characters of the format. The format can be up to 32,767 characters long, but much shorter text is more practical. (Blanks count as characters.)

variable

is the name of the variable that you want to format.

In the FORMAT statement in the DATA step, the period immediately following the format-name is required.

You can create multiple formats in one PROC FORMAT step. Use a VALUE statement to define each new format.

To create a new format and apply it in a SAS procedure

PROC FORMAT;

VALUE format-name value=format

value=format

.

.

.

value=format;

DATA

INPUT

FORMAT variable format-name.;

DATALINES;

data lines

;

RUN;

PROC PRINT DATA=data-set-name;

FORMAT variable format-name. ;

In the FORMAT statement in the PROC step, the period immediately following the format-name is required.

ENDNOTES

[1] Data is adapted from Summary of State Speed Laws, August 2007, produced by the National Highway Traffic Safety Administration and available at www.nhtsa.dot.gov. The fines are for situations with no “special circumstances,” such as construction, school zones, driving while under the influence, and so on. Some states have varying fines; in these situations, the table shows the maximum fine.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset