Chapter 2: Writing Your First SAS Program

2.1  A Simple Program to Read Raw Data and Produce a Report

2.2  Enhancing the Program

2.3  More on Comment Statements

2.4  How SAS Works (a Look inside the “Black Box”)

2.5  Problems

 

2.1  A Simple Program to Read Raw Data and Produce a Report

Let’s start out with a simple program to read data from a text file and produce some basic summaries. Then we’ll go on to enhance the program.

The task: you have data values in a text file. These values represent Gender (M or F), Age, Height, and Weight. Each data value is separated from the next by one or more blanks. You want to produce two reports: one showing the frequencies for Gender (how many Ms and Fs); the other showing the average age, height, and weight for all the subjects.

Here is a listing of the raw data file Mydata.txt that you want to analyze:

 M 50 68 155

 F 23 60 101

 M 65 72 220

 F 35 65 133

 M 15 71 166

Here is the program:

Program 2.1: Your First SAS Program

data Demographic;

  infile "C:ookslearningMydata.txt";

  input Gender $ Age Height Weight;

run;

title "Gender Frequencies";

proc freq data=Demographic;

   tables Gender;

run;

 

title "Summary Statistics";

proc means data=Demographic;

   var Age Height Weight;

run;

Notice that this program consists of one DATA step followed by two PROC steps. As we mentioned in Chapter 1, the DATA step begins with the word DATA. In this program, the name of the SAS data set being created is Demographic. The next line (the INFILE statement) tells SAS where the data values are coming from. In this example, the text file Mydata.txt is in the folder C:ookslearning on a Windows system.

If you decide to run some of the programs in this book, you can download all the programs and data files from the author website (support.sas.com/cody) and place them in a folder of your choice. For example, if you placed the text file Mydata.txt in a folder C:SASdata, your INFILE statement would read:

infile "C:SASdataMydata.txt";

If you are using the SAS University Edition, you may want to place all the data files in the folder C:SASUniversityEditionMyfolders, which is the default location you set up when you configured your virtual machine.

The INPUT statement shown here is one of four different methods that SAS has for reading raw data. This program uses the list input method, appropriate for data values separated by delimiters. The default data delimiter for SAS is a blank. SAS can also read data separated by any other delimiter (for example, commas or tabs) with a minor change to the INFILE statement. When you use the list input method for reading data, you need to list only the names you want to give each data value. SAS calls these variable names. As mentioned in Chapter 1, these names must conform to the SAS naming convention.

Notice the dollar sign ($) following the variable name Gender. The dollar sign following any variable name tells SAS that values for those variables are stored as character values. Without a dollar sign, SAS assumes values are numbers and should be stored as SAS numeric values.

Finally, the DATA step ends with a RUN statement. You will see later that, depending on what platform you are running your SAS program, RUN statements are not always necessary.

In Program 2.1 we placed a blank line between each step to make the program easier to read. Feel free to include blank lines whenever you wish to make the program more readable.

There are several TITLE statements in this program. You will see this statement in many of the SAS programs in this book. As you may have guessed, the text following the keyword TITLE (placed in single or double quotes, or even no quotes—as long as the title doesn't contain any single quotes) is printed at the top of each page of SAS output. Statements such as the TITLE statement are called global statements. The term global refers to the fact that the operations these statements perform are not tied to one single DATA or PROC step. They affect the entire SAS environment. In addition, the operations performed by these global statements remain in effect until they are changed. For example, if you have a single TITLE statement in the beginning of your program, that title will head every page of output from that point on until you write a new TITLE statement. It is a good practice to place a TITLE statement before every procedure that produces output to make it easy for someone to read and understand the information on the page. If you exit your SAS session, your titles are all reset and you need to submit new TITLE statements if you want them to appear.

In all the output displayed in this book, the global option NOPROCTITLE was in effect. Without this option, all output from every procedure would contain text such as "The MEANS Procedure" before it prints your own title statements. The way to set this option is to submit the line:

ODS NoProcTitle;

PROC FREQ is one of the many built-in SAS procedures. As the name implies, this procedure counts frequencies of data values.

To tell this procedure which variables you want to include in your frequency counts, you add an additional statement—the TABLES (or TABLE) statement. Following the word TABLES, you list those variables for which you want frequency counts. You could actually omit a TABLES statement but, if you did, PROC FREQ would compute frequencies for every variable in your data set (including all the numeric variables).

PROC MEANS is another built-in SAS procedure that computes means (averages) as well as some other statistics such as the minimum and maximum value of each variable. A VAR (short for variables) statement supplies PROC MEANS with a list of analysis variables (which must be numeric) for which you want to compute these statistics. Without a VAR statement, PROC MEANS computes statistics on every numeric variable in your data set.

Depending on whether you are using the SAS Display Manager on a Windows operating system, SAS Enterprise Guide, or SAS Studio (either on a standard version of SAS or the SAS University Edition, or even a mainframe computer), the actual mechanics of submitting your program may differ slightly. You can see screen shots for three different environments below:

Figure 2.1 shows a screen that runs SAS in the windowing environment on a Windows operating system:  For most of the examples in this book, this is the system you will see. The programs that run under the other environments are very similar and you should not have any problems, regardless of which environment you are using.

Figure 2.1: View of the Enhanced Editor Window Using the SAS Windowing Environment

Figure 2.1: View of the Enhanced Editor Window Using the SAS Windowing Environment

 

When you use the SAS windowing environment, you write your program in the Enhanced EDITOR window (shown in Figure 2.1). Other windows that you will see later are the LOG window (where you see a listing of your program, possible error messages, and information about data files that were read or written) and the OUTPUT window where you see your results.

To run this program, click the SUBMIT icon (see Figure 2.2).

Figure 2.2: SUBMIT Icon

Figure 2.2: SUBMIT Icon

Before we show you the LOG and OUTPUT windows, here are screen shots (see Figure 2.3 and Figure 2.4) of the same program using SAS Enterprise Guide and SAS Studio (from the University Edition):

Figure 2.3: Running Your Program in Enterprise Guide

Figure 2.3: Running Your Program in Enterprise Guide

Figure 2.4: Running Your Program in SAS Studio (University Edition)

Figure 2.4: Running Your Program in SAS Studio (University Edition)

The programs are almost identical regardless of which SAS environment you are using. You might have noticed that the INFILE statement in the SAS Studio version is different from the other two programs. The "short answer" to this is that the SAS University Edition runs in a virtual environment and you need to direct your programs to find data on your disk in a slightly different manner. Please refer to An Introduction to SAS University Edition by this author for more information on how this works, or view the online information (PDFs and videos) supplied by SAS.

It's time to see what happens when you click the SUBMIT icon in the windowing environment example. Here is what you will see on your screen (see Figure 2.5):

Figure 2.5: Output from Program 2.1

Figure 2.5: Output from Program 2.1

What you see here is the Output window. (The exact appearance of these windows will vary, depending on how you have set up SAS.)  The top part of the output (produced by PROC FREQ) shows that there were two females and three males in the data set (the numbers listed under Frequency). The column labeled Percent shows the frequencies as a percent of all the non-missing data values in the data set. The last two columns display Cumulative Frequency and Cumulative Percentages. There were two females (representing 40% of the subjects) and two plus three or five males plus females, which are referred to as a cumulative frequency, (representing 100% of the subjects). The Cumulative Percent columns show the cumulative counts as percentages. You will see later how to eliminate these last two columns because they are seldom used.

Below the frequency display you see Summary Statistics for the three numeric variables (produced by PROC MEANS). N is the number of non-missing values, Mean is the arithmetic mean, Std Dev is the standard deviation, Minimum and Maximum are the smallest non-missing value and the largest value, respectively.

Notice the two titles correspond to the text you placed in the TITLE statement.

Note: By default, SAS centers all output. For most of the output in this book, a system option called NOCENTER was used so that the output is left-justified. The statement (not shown here) Options NOCENTER was included at the beginning of every program.

You can switch among the three windows by clicking on the appropriate tab at the bottom of the screen. These tabs will be located in other places if you are using Enterprise Guide or SAS Studio, but you will have no trouble finding them. The tabs for the windowing environment are shown in Figure 2.6:

Figure 2.6: Tabs for Selecting the Editor, Log, or Output Windows in the Windowing Environment

Figure 2.6: Tabs for Selecting the Editor, Log, or Output Windows in the Windowing Environment

Figure 2.7 shows a complete listing of the Log window:

Figure 2.7: Inspecting the LOG Window

Figure 2.7: Inspecting the LOG Window

Note: The Log window is very important. It is here that you see any error messages if you have made any mistakes in writing your program. In this example, there were no mistakes (a rarity for this author), so you see only the original program along with some information about the data file that was read and some timing information.

Let’s spend a moment looking over the log. First, you see that the data came from the Mydata.txt file located in the C:ookslearning folder. Next, you see a note showing that five records (lines) of data were read and that the shortest line was 11 characters long and the longest was 13. The next note indicates that SAS created a data set called Work.Demographic. The Demographic part makes sense because that is the name you used in the DATA statement. The Work part is the way SAS tells you that this is a temporary data set—when you end the SAS session, this data set will self-destruct (and the secretary will disavow all knowledge of your actions). You will see later how to make SAS data sets permanent.

Also, as part of this note, you see that the Work.Demographic data set has five observations and four variables. The SAS term observations is analogous to rows in a table. The SAS term variables is analogous to columns in a table. In this example, each observation corresponds to the data collected on each subject and each variable corresponds to each item of information you collected on each subject.

The remaining notes show the real and CPU time used by SAS to process each procedure.

2.2  Enhancing the Program

At this point, it would be a good idea to access SAS somewhere, enter this program (you will probably want to change the name of the folder where you are storing your data file), and submit it.

Now, let’s enhance the program so you can learn some more about how SAS works. For this version of the program, you will add a comment statement and compute a new variable based on the height and weight data. Here is the program:

Program 2. 2: Enhancing the Program

*Program name: Demog.sas stored in the C:ookslearning folder.

 Purpose: The program reads in data on height and weight   

 (in inches and pounds, respectively) and computes a body

 mass index (BMI) for each subject.

 Programmer: Ron Cody

 Date Written: October 5, 2017;

data Demographic;

     infile "C:ookslearningMydata.txt";

     input Gender $ Age Height Weight;

     *Compute a body mass index (BMI);

     BMI = (Weight / 2.2) / (Height*.0254)**2;

run;

 

The statements beginning with an asterisk (*) are called comment statements. They enable you to include comments for yourself or others reading your program later. One way of writing a SAS comment is to start with an asterisk, write as many comment lines as you like, and end the statement (as you do all SAS statements) with a semicolon. Comments are not only useful for others trying to read and understand your program—they are useful to you as well. Just imagine trying to understand a section of a long program that you wrote a year ago and now need to correct or modify. Trust me—you will be glad you commented your program. You should usually include information about the file name used to store the program, the purpose of the program, and the date you wrote the program as well as the date and purpose of any changes you made to the program.

The statement that starts with BMI= is called an assignment statement. It is an instruction to perform the computation on the right-hand side of the equal sign and assign the resulting value to the variable named on the left. In this example, you are creating a new variable named BMI that is defined as a person’s weight (in kilograms) divided by a person’s Height (in meters) squared. BMI (body mass index) is a useful index of obesity. Medical researchers often use BMI when computing the health risks of various diseases (such as heart attacks).

This assignment statement uses three of the basic arithmetic operators used by SAS: the forward slash (/) for division, the asterisk (*) for multiplication, and the double asterisk (**) for exponentiation. This is a good time to mention the full set of arithmetic operators. They are as follows:

Operator

Description

Priority

+

Addition

Lowest

Subtraction

Lowest

*

Multiplication

Next Highest

/

Division

Next Highest

**

Exponentiation

Highest

Negation

Highest

The same rules you learned about the order of algebraic operations in school apply to SAS arithmetic operators. That is, multiplication and division occur before addition and subtraction. In the previous table, the two highest priority operations occur before all others; the next highest operations occur before the lowest. For example, the value of x in the following assignment statement is 14:

x = 2 + 3 * 4;

If you want to multiply the sum of 2 + 3 by 4, you need to use parentheses like this:

x = (2 + 3) * 4;

When you include parentheses in your expression, all operations within the parentheses are performed first. In this example, because parentheses surround the addition operation, the 2 and 3 are added together first and then multiplied by 4, yielding a value of 20.

As a further example of how the priority of arithmetic operators works, take a look at the expression here that uses each of the different operators:

x = 2**3 + 4 * -5;

 

Because exponentiation and negation occur first, you have the following equation:

x = 8 + 4 * -5;

This gives you:

8 + (-20) = -12

2.3  More on Comment Statements

Another way to add a comment to a SAS program is to start it with a slash star (/*) and end it with a star slash (*/). You may even embed comments of this type of comment within a SAS statement. For example, you could write:

input Gender $ Age /* age is in years */ Height Weight;

If you are using a mainframe computer, you may want to avoid starting your /* in column one because the operating system will interpret it as job control language (JCL) statement and terminate your SAS job.

Be sure that you do not nest the /* */ style comments. For example, you would get an error if you submitted Program 2.3. The first /* (shown in bold) would match the first */ (also shown in bold), leaving invalid SAS code to be processed.

Program 2.3:  Incorrect Nesting of /* */ Style Comments

 /* This comment contains a /* style */ comment embedded

    within another comment. Notice that the first star

    slash ends the comment and the remaining portion of

    the comment will cause a syntax error */

2.4  How SAS Works (a Look inside the “Black Box”)

This is a good time to explain some of the inner workings of SAS as it processes a DATA step. Looking again at Program 2.2, let’s “play computer.” SAS processes DATA steps in two stages—a compile stage and an execution stage.

Here’s how it works. SAS recognizes the keyword DATA and understands that it needs to process a DATA step. In the compile stage, it does some important housekeeping tasks. First, it prepares an area to store the SAS data set (Demographic). It checks the input file (described by the INFILE statement) and determines various attributes of this file (such as the length of each record). Next, it sets aside a place in memory called the input buffer, where it will place each record (line) of data as it is read from the input file. It then reads each line of the program, checks for invalid syntax, and determines the name of all the variables that are in the data set. Depending on your INPUT statement (or other SAS statements), SAS determines whether each variable is character or numeric and the storage length of each variable. This information is called the descriptor portion of the data set. In this compile stage, no data is read from the input file and no logical statements are evaluated. Each line is processed in order from the top to the bottom and left to right.

In this example, SAS sees the first four variables listed in the INPUT statement, decides that Gender is character (because of the dollar sign ($) following the name), and sets the storage length of each of these variables. Because no lengths are specified by the program, each variable is given a default length (8 bytes for the character and numeric variables). Eight bytes for a character variable means you can store values with up to eight characters. Eight bytes for numeric variables means that SAS can store numbers with approximately 14 or 15 significant figures (depending on the operating system). It is important to realize that the 8 bytes used to store numeric values does not limit you to numbers with eight digits. The information about each of the variables is stored in a reserved area of memory called the Program Data Vector (PDV for short). Think of the PDV as a set of post office boxes, with one box per variable, and information affixed to each box showing the variable name, type (character or numeric), and storage length. Some additional pieces of information are also stored for each variable. We’ll discuss these later when we discuss more advanced programming techniques.

It helps to picture the PDV like this:

Gender

Character

8 bytes

Age

Numeric

8 bytes

Height

Numeric

8 bytes

Weight

Numeric

8 bytes

 

 

 

 

This shows that each variable has a name, a type, and a storage length. The second row of boxes is used to store the value for each of these variables.

Next, SAS sees the assignment statement defining a new variable called BMI. Because BMI is defined by an arithmetic operation, SAS decides that this variable is numeric, uses the default storage length for numerics (8 bytes), and adds it to the PDV.

Gender

Character

8 bytes

Age

Numeric

8 bytes

Height

Numeric

8 bytes

Weight

Numeric

8 bytes

BMI

Numeric

8 bytes

 

 

 

 

 

SAS has reached the bottom of the DATA step and the compile stage is complete. Now it begins the execution stage.

When you are reading text data from a file or variables defined by an assignment statement, SAS sets all the values in the PDV to a missing value. This happens before SAS reads in new line of data to ensure that there is a clean slate and that no values are left over from a previous operation. SAS uses blanks to represent missing character values and periods to represent missing numeric values. Therefore, you can now picture the PDV like this:

Gender

Character

8 bytes

Age

Numeric

8 bytes

Height

Numeric

8 bytes

Weight

Numeric

8 bytes

BMI

Numeric

8 bytes

 

.

.

.

.

The first line of data from the input file is copied to the input buffer.

M

50

68

155

An internal pointer that keeps track of the current record in the input file now moves to the next line.

In this example, the values in the text file are separated by one or more blanks. This arrangement of data values is called delimited data and the method that SAS uses to read this type of data is called list input. SAS expects blanks as the default delimiter but, as you will see later, you can tell SAS if your file contains other delimiters (such as commas) between the data values.

SAS reads each value until it reaches a delimiter and then moves along until it finds the next value. The values in the input buffer are now copied to the PDV as follows:

Gender

Character

8 bytes

Age

Numeric

8 bytes

Height

Numeric

8 bytes

Weight

Numeric

8 bytes

BMI

Numeric

8 bytes

M

50

68

155

.

Next, BMI is evaluated by substituting the values in the PDV for Height and Weight and evaluating the equation. This value is then added to the PDV:

Gender

Character

8 bytes

Age

Numeric

8 bytes

Height

Numeric

8 bytes

Weight

Numeric

8 bytes

BMI

Numeric

8 bytes

M

50

68

155

23.616947202

SAS has reached the bottom of the DATA step (because it sees the RUN statement—an explicit step boundary).

Note that SAS would sense the end of the DATA step without a RUN statement if the next line were a DATA or PROC statement (an implicit step boundary). As a matter of style, it is preferable to end each DATA or PROC step with a RUN statement.

At this point the values in the PDV are written to the SAS data set (Demographic), forming the first observation. There is, by default, an implied OUTPUT statement at the bottom of each DATA step. SAS returns back to the top of the DATA step (the line following the DATA statement) and sees that there are more lines of data to read (when it executes the INPUT statement). It repeats the process of setting values in the PDV to missing, reading new data values, computing the BMI, and outputting observations to the SAS data set. This continues until the INPUT statement reads the end-of-file marker in the input file. You can think of a DATA step as a loop that continues until all data values have been read.

At this time, you may find this discussion somewhat tedious. However, as you learn more advanced programming techniques, you should review this discussion—it can really help you understand the more advanced and subtle features of SAS programming.

 

2.5  Problems

Solutions to odd-numbered problems are located at the back of this book. Solutions to all problems are available to professors or by permission of SAS Press. If you are a professor, visit the book’s companion website at support.sas.com/cody for information about how to obtain the solutions to all problems.

1.       You have a text file called Stocks.txt containing a stock symbol, a price, and the number of shares. Here are some sample lines of data:    

 AMGN 67.66 100

 DELL 24.60 200

 GE 34.50 100

 HPQ 32.32 120

 IBM 82.25 50

 MOT 30.24 100

a.      Using this raw data file, create a temporary SAS data set (Portfolio). Choose your own variable names for the stock symbol, price, and number of shares. In addition, create a new variable (call it Value) equal to the stock price times the number of shares. Include a comment in your program describing the purpose of the program, your name, and the date the program was written.

b.      Write the appropriate statements to compute the average price and the average number of shares of your stocks.

2.       Given the program here, add the necessary statements to compute four new variables:

a.      Weight in kilograms (1 kg = 2.2 pounds). Name this variable WtKg.

c.      Height in centimeters (1 inch = 2.54 cm). Name this variable HtCm.

d.      Average blood pressure (call it AveBP) equal to the diastolic blood pressure plus one-third the difference of the systolic blood pressure minus the diastolic blood pressure.

e.      A variable (call it HtPolynomial) equal to 2 times the height squared plus 1.5 times the height cubed.

Here is the program for you to modify:

data Prob2;

   input ID $

         Height /* in inches */

         Weight /* in pounds */

         SBP    /* systolic BP  */

         DBP    /* diastolic BP */;

< place your statements here >

datalines;

001 68 150 110 70

002 73 240 150 90

003 62 101 120 80

;

title "Listing of Prob2";

proc print data=Prob2;

run;

Note: This program uses a DATALINES statement, which enables you to include the  input data directly in the program. You can read more about this statement in the next chapter.

3.       You are given an equation to predict electromagnetic field (EMF) strength, as follows:

      EMF = 1.45 x V + (R/E) x V3 – 125.

If your SAS data set contains variables called V, R, and E, write a SAS assignment statement to compute the EMF strength.

4.       What is wrong with this program?

  001  data New-Data;

  002     infile C:ookslearningProb4data.txt;

  003     input x1 x2

  004     y1 = 3(x1) + 2(x2);

  005     y2 = x1 / x2;

  006     New_Variable_from_X1_and_X2 = X1 + X2 – 37;

  007  run;

Note: Line numbers are for reference only; they are not part of the program.

5.       What is wrong with this program?

001 data XYZ;

002    infile "C:ookslearningDataXYZ.txt";

003    input Gender X Y Z;

004    Sum = X + y + Z;

005 run;

The File C:ookslearningDataXYZ.txt looks as follows:

Male 1 2 3

Female 4 5 6

Male 7 8 9

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset