CHAPTER  11

Debugging Your SAS Programs

11.1    Writing SAS Programs That Work

11.2    Fixing Programs That Don’t Work

11.3    Searching for the Missing Semicolon

11.4    Note: INPUT Statement Reached Past the End of a Line

11.5    Note: Lost Card

11.6    Note: Invalid Data

11.7    Note: Missing Values Were Generated

11.8    Note: Numeric Values Have Been Converted to Character (or Vice Versa)

11.9    DATA Step Produces Wrong Results but No Error Message

11.10  Error: Invalid Option, Error: The Option Is Not Recognized, or Error: Statement Is Not Valid

11.11  Note: Variable Is Uninitialized or Error: Variable Not Found

11.12  SAS Truncates a Character Variable

11.13 Saving Memory or Disk Space

11.1   Writing SAS Programs That Work

It’s not always easy to write a program that works the first time you run it. Even experienced SAS programmers will tell you it’s a delightful surprise when their programs run on the first try. The longer and more complicated the program, the more likely it is to have syntax or logic errors. But don’t despair, there are a few guidelines you can follow that can make your programs run correctly sooner and help you discover errors more easily.

Make programs easy to read  One simple thing that you can do is develop the habit of writing programs in a neat and consistent manner. Programs that are easy to read are easier to debug and will save you time in the long run. For easier programming, follow these guidelines:

      Put only one SAS statement on a line. SAS allows you to put as many statements on a line as you want, which may save some space in your program, but the saved space is rarely worth the sacrifice in readability.

      Use indention to show the different parts of the program. Indent all statements within the DATA and PROC steps. This way you can tell at a glance how many DATA and PROC steps there are in a program and which statement belongs to which step. It’s also helpful to further indent any statements between a DO statement and its END statement.

      Use comment statements generously to document your programs. This takes some discipline but is important, especially if anyone else is likely to read or use your program. Everyone has a different programming style, and it is often impossible to figure out what someone else’s program is doing and why. Comment statements take the mystery out of the program.

Syntax sensitive editors  SAS program editors color code your programs as you write them. SAS keywords appear in one color, variables in another. All text within quotation marks appears in the same color, so it is immediately obvious when you forget to close your quotation marks. Similarly, missing semicolons are much easier to discover because the colors in your program are not right. Catching errors as you type them can be a real time saver.

Test each part of the program  You can increase your programming efficiency tremendously by making sure each part of your program is working before moving on to the next part. If you were building a house, you would make sure the foundation was level and square before putting up the walls. You would test the plumbing before finishing the bathroom. You are required to have each stage of the house inspected before moving on to the next. The same should be done for your SAS program. But you don’t have to wait for the inspector to come out; you can do it yourself.

If you are reading data from a file, make sure that it has been read correctly before moving on. Sometimes, even though there are no errors or even suspicious notes in your SAS log, the SAS data set is not correct. This could happen because SAS did not read the data the way you imagined (after all it does what you say, not what you’re thinking) or because the data had some peculiarities you did not realize. For example, a researcher who received two data files from Taiwan wanted to merge them together by date. She could not figure out why they refused to merge correctly until she examined both data sets and realized one of the files used Taiwanese dates, which are offset by 11 years.

It’s a good habit to look at all the SAS data sets you create in a program at least once to make sure they are correct. As with reading raw data files, sometimes merging and setting data sets can produce the wrong result even though there were no error messages.

Test programs with small data sets  Sometimes it’s not practical to test your program with your entire data set. If your data files are very large, it may take a long time for your programs to run. In these cases, you can test your program with a subset of your data.

If you are reading data from a file, you can use the OBS= option in the INFILE statement to tell SAS to stop reading when it gets to that line in the file. This way you can read only the first 50 or 100 lines of data, or however many it takes to get a good representation of your data. The following statement will read only the first 100 lines of the raw data file Mydata.dat:

INFILE 'Mydata.dat' OBS = 100;

You can also use the FIRSTOBS= option to start reading from the middle of the data file. So, if the first 100 data lines are not a good representation of your data but 101 through 200 are, you can use the following statement to read just those lines:

INFILE 'Mydata.dat' FIRSTOBS = 101 OBS = 200;

Here, FIRSTOBS= and OBS= relate to the records of raw data in the file. These do not necessarily correspond to the observations in the SAS data set created. If, for example, you are reading two records for each observation, then you would need to read 200 records to get 100 observations.

If you are reading a SAS data set instead of a raw data file, you can use the OBS= and FIRSTOBS= data set options in the SET, MERGE, or UPDATE statements (discussed in Section 6.10). This controls which observations are processed in the DATA step. For example, the following DATA step will read the first 50 observations in the CATS data set. Note that when reading SAS data sets, OBS= and FIRSTOBS= truly do correspond to the observations and not to lines of raw data:

DATA sampleofcats;

   SET cats (OBS = 50);

RUN;

Test with representative data  Using OBS= and FIRSTOBS= is an easy way to test your programs, but sometimes it is difficult to get a good representation of your data this way. You may need to create a small test data set by extracting representative parts of the larger data set. Or, you may want to make up representative data for testing purposes. Making up data has the advantage that you can simplify the data and make sure you have every possible combination of values to test.

Sometimes you may want to make up data and write a small program just to test one aspect of your larger program. This can be extremely useful for narrowing down possible sources of errors in a large, complicated program.

11.2   Fixing Programs That Don’t Work

image

In spite of your best efforts, sometimes programs just don’t work. More often than not, programs don’t run the first time. Even with simple programs it is easy to forget a semicolon or misspell a keywordeveryone does sometimes. If your program doesn’t work, the source of the problem may be obvious, like an error message with the offending part of your program underlined, or not so obvious as when you have no errors but still don’t have the expected results. Whatever the problem, here are a few guidelines you can follow to help fix your program.

Read the SAS log  The SAS log has a wealth of information about your program. In addition to listing the program statements, it tells you things like how many lines were read from your raw data file and what the minimum and maximum line lengths were. It states the number of observations and variables in each SAS data set you create. Information like this may seem inconsequential at first but can be very helpful in finding the source of your errors.

The SAS log has three types of messages about your program: errors, warnings, and notes.

Errors  These are hard to ignore. Not only do they come up in red on your screen, but your program will not run with errors. Usually, errors are some kind of syntax or spelling mistake. The following log shows the error messages when you accidentally add a slash between the PROC PRINT and DATA= keywords. SAS underlines the problem (the slash) and tells you there is a syntax error. Sometimes SAS tells you what it expected and this can be very revealing.

1    PROC PRINT / DATA=one;

                -

                22

                200

ERROR 22-322: Syntax error, expecting one of the following: ;, BLANKLINE, CONTENTS,

              DATA, DOUBLE, GRANDTOTAL_LABEL, GRANDTOT_LABEL, GRAND_LABEL,

              GTOTAL_LABEL, GTOT_LABEL, HEADING, LABEL, N, NOOBS, NOSUMLABEL,

              OBS, ROUND, ROWS, SPLIT, STYLE, SUMLABEL, UNIFORM, WIDTH.

ERROR 200-322: The symbol is not recognized and will be ignored.

The location of the error is usually easy to find because it is underlined, but the source of the error can be tricky. Sometimes what is wrong is not what is underlined but something else earlier in the program.

Warnings  These are less serious than errors because your program will run with warnings. But beware, a warning may mean that SAS has done something you have not intended. For example, SAS will attempt to correct the spelling of certain keywords. If you misspell INPUTas IMPUT, you will get the following message in your log:

WARNING 1-322: Assuming the symbol INPUT was misspelled as IMPUT.

Usually, you would think, “SAS is so smartit knows what I meant to say,” but occasionally that may not be what you meant at all. Make sure that you know what all the warnings are about and that you agree with them.

Notes  These are less straightforward than either warnings or errors. Sometimes, notes just give you information, like telling you the execution time of each step in your program. But sometimes notes can indicate a problem. Suppose, for example, that you find the following note in your SAS log:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

This could mean that SAS did exactly what you wanted, or it could indicate a problem with your program or your data. Make sure that you know what each note means and why it is there.

Start at the beginning  Whenever you read the SAS log, start at the beginning. This might seem like a ridiculous statementwhy wouldn’t you start at the beginning? Well, often when you run a SAS program, the SAS log rolls by in the Log window. So when the program is finished, you are left looking at the end of the log. If you happen to see an error at the end of the log, it is natural to try to fix that error firstthe first one you see. Avoid this temptation. Often errors at the end of the log are caused by earlier ones. If you fix the first error, often most or all of the other errors will disappear. If your lawnmower is out of gas and won’t start, it’s probably better to add gas before trying to figure out why it won’t start. The same logic applies to debugging SAS programs; fixing one problem will often fix others.

image

Look for common mistakes first  More often than not there is a simple reason why your program doesn’t work. Look for simple reasons before trying to find something more complicated. The remainder of this chapter consists of sections discussing common errors encountered in SAS programming. When you see this little bug  in the upper right corner of a section, you’ll know that the material deals with how to debug your program.

Sometimes error messages just don’t make any sense. For example, you may get an error message saying the INPUT statement is not valid. This doesn’t make much sense because you know INPUT is a valid SAS statement. In cases like these, look for missing semicolons in the statements before the error. If SAS has underlined an item, be sure to look not only at the underlined item but also at the previous few statements.

Finally, if you just can’t figure out why you are not getting the results you expect, make sure to take a close look at any new SAS data sets you create. This can really help you discover errors in your logic, and sometimes uncover surprising details about your data.

Check your syntax  If you have large data sets, you may want to check for syntax errors in your program before processing your data. To do this, add the following line to your program and submit it in the usual way:

OPTIONS OBS=0 NOREPLACE;

The OBS=0 option tells SAS not to process any data, while the NOREPLACE option tells SAS not to replace existing SAS data sets with empty ones. Once you know your syntax is correct, you can resubmit your program without the OPTIONS statement in batch mode, or submit the following if you are using the SAS interactively.

OPTIONS OBS=MAX REPLACE;

Note that this syntax check will not uncover any errors related to your data or logic.

 

image

11.3   Searching for the Missing Semicolon

image

Missing semicolons are the most common source of errors in SAS programs. For whatever reason, we humans can’t seem to remember to put a semicolon at the end of all our statements. (Maybe we all have rebellious right pinkieswho knows.) This is unfortunate because, while it is easy to forget the semicolon, it is not always easy to find the missing semicolon. The error messages produced are often misleading, making it difficult to find the error.

SAS reads statements from one semicolon to the next without regard to the layout of the program. If you leave off a semicolon, you in effect concatenate two SAS statements. Then SAS gets confused because it seems as though you are missing statements, or it tries to interpret entire statements as options in the previous statement. This can produce some very puzzling messages. So, if you get an error message that just doesn’t make sense, look for missing semicolons.

Example  The following program is missing a semicolon on the comment statement before the DATA statement:

* Read the data file ToadJump.dat using list input

DATA toads;

   INFILE 'c:MyRawDataToadJump.dat';

   INPUT ToadName $ Weight Jump1 Jump2 Jump3;

RUN;

Here is the SAS log after the program has run:

1    * Read the data file ToadJump.dat using list input

2    DATA toads;

3       INFILE 'c:MyRawDataToadJump.dat';

        ------

        180

ERROR 180-322: Statement is not valid or it is used out of proper order.

4       INPUT ToadName $ Weight Jump1 Jump2 Jump3;

        -----

        180

ERROR 180-322: Statement is not valid or it is used out of proper order.

5    RUN;

In this case, DATA toads becomes part of the comment statement. Because there is now no DATA statement, SAS underlines the INFILE and INPUT keywords and says, “Hey, these statements are in the wrong place; they have to be part of a DATA step.” This doesn’t make much sense to you because you know INFILE and INPUT are valid statements, and you did put them in a DATA step (or so you thought). That’s when you should suspect a missing semicolon.

 

Example  The next example shows the same program, but now the semicolon is missing from the DATA statement. The INFILE statement becomes part of the DATA statement, and SAS tries to create a SAS data set named INFILE. SAS also tries to interpret the filename, 'c:MyRawDataToadJump.dat' as a SAS data set name, but the .dat extension is not valid for SAS data sets. It also gives you an error saying that there is no DATALINES or INFILE statement. In addition, you get some warnings about data sets being incomplete. This is a good example of how one simple mistake can produce a lot of confusing messages:

30   * Read the data file ToadJump.dat using list input;

31   DATA toads

32     INFILE 'c:MyRawDataToadJump.dat';

33     INPUT ToadName $ Weight Jump1 Jump2 Jump3;

34   RUN;

ERROR: No DATALINES or INFILE statement.

ERROR: Extension for physical file name 'C:MyRawDataToadJump.dat' does

       not correspond to a valid member type.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TOADS may be incomplete.  When this step was

         stopped there were 0 observations and 5 variables.

WARNING: Data set WORK.TOADS was not replaced because this step was stopped.

WARNING: The data set WORK.INFILE may be incomplete.  When this step was

         stopped there were 0 observations and 5 variables.

Missing semicolons can produce a variety of error messages. Usually, the messages say that either a statement is not valid, or an option or parameter is not valid or recognized. Sometimes you don’t get an error message, but the results are still not right.

The DATASTMTCHK system option  Some missing semicolons, such as the one in the last example, are easier to find if you use the DATASTMTCHK system option. This option controls which names you can use for SAS data sets in a DATA statement. By default, it is set so that you cannot use the words: MERGE, RETAIN, SET, or UPDATE as SAS data set names. This prevents you from accidentally overwriting an existing data set just because you forget a semicolon at the end of a DATA statement. You can make all SAS keywords invalid SAS data set names by setting the DATASTMTCHK option to ALLKEYWORDS. The partial log below again shows a missing semicolon at the end of the DATA statement, but this time DATASTMTCHK is set to ALLKEYWORDS:

35   OPTIONS DATASTMTCHK=ALLKEYWORDS;

36   * Read the data file ToadJump.dat using list input;

37   DATA toads

38     INFILE 'C:MyRawDataToadJump.dat';

       ------

       57

ERROR 57-185: INFILE is not allowed in the DATA statement when option

              DATASTMTCHK=ALLKEYWORDS.  Check for a missing semicolon in

              the DATA statement, or use DATASTMTCHK=NONE.

39     INPUT ToadName $ Weight Jump1 Jump2 Jump3;

40   RUN;

image

11.4   Note: INPUT Statement Reached Past the End of a Line  

image

The note “SAS went to a new line when INPUT statement reached past the end of a line” is rather innocent looking, but its presence can indicate a problem. This note often goes unnoticed. It doesn’t come up in red lettering. It doesn’t cause your program to stop. But look for it in your SAS log because it is a common note that usually means there is a problem.

This note means that as SAS was reading raw data, it got to the end of the data line before it read values for all the variables in your INPUT statement. When this happens, SAS goes by default to the next line of data to get values for the remaining variables. Sometimes this is exactly what you want SAS to do, but if it’s not, take a good look at your SAS log and output to be sure you know why this is happening.

Check the note in your SAS log that tells you the number of lines SAS read from the data file and the number of observations in the SAS data set. If you have fewer observations than lines read, and you planned to have one observation per line, then you know you have a problem. Taking a close look at your data set can be very helpful in determining the source of the problem.

Example  This example shows what can happen if you are using list input, and don’t have periods for missing values. The following data come from a middle school long jump competition, where the student’s number is followed by the distances for each of three jumps. When a student was disqualified for a jump, no entry was made for that jump:

10 3.5 3.7

12 4.3 4.0 3.9

23 3.8 3.9 4.0

15 2.1 2.3

16 4.0 2.2

28 3.5 3.6 4.2

Here is the SAS log from a program that reads the raw data using list input:

1   DATA jumps;

2      INFILE 'c:MyRawDataLongJump.dat';

3      INPUT StudentNumber Jump1 Jump2 Jump3;

4   RUN;

NOTE: The infile 'c:MyRawDataLongJump.dat' is:

      File Name=c:MyRawDataLongJump.dat,

      RECFM=V,LRECL=256

NOTE: 6 records were read from the infile 'c:MyRawDataLongJump.dat'.

        The minimum record length was 10.

        The maximum record length was 14.

NOTE:  SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.JUMPS has 4 observations and 4 variables.

  NOTE: DATA statement used (Total process time):

        real time           0.37 seconds

Notice that six records were read from the raw data file.

But there are only four observations in the SAS data set.

The note, “…INPUT statement reached past…,” should alert you that there may be a problem.

If you look at the data set, you can see that there is a problem. The numbers don’t look correct. (Can a person jump 16 meters?)

 

StudentNumber

Jump1

Jump2

Jump3

1

10

3.5

3.7

12.0

2

23

3.8

3.9

4.0

3

15

2.1

2.3

16.0

4

28

3.5

3.6

4.2

Here, SAS went to a new line when you didn’t want it to. To fix this problem, the simplest thing to do is use the MISSOVER option in the INFILE statement. MISSOVER instructs SAS to assign missing values instead of going to the next line when it runs out of data. The INFILE statement would look like this:

INFILE 'c:MyRawDataLongJump.dat' MISSOVER;

Possible causes  Other reasons for receiving a note informing you that the INPUT statement reached past the end of the line include:

      You planned for SAS to go to the next data line when it ran out of data.

      Blank lines in your data file, usually at the beginning or end, can cause this note. Look at the minimum line length in the SAS log. If it is zero, then you have blank lines. Edit out the blank lines and rerun your program.

      If you are using list input and you do not have a space between every data value, you can get this note. For example, if you try to read the following data using list input, SAS will run out of data for the Gilroy Garlics because there is no space between the 15 and the 1035. SAS will read it as one number, then read the 12 where it should have been reading the 1035, and so on. To correct this problem, either add a space between the two numbers, or use column or formatted input.

   Columbia Peaches      35  67  1 10  2  1

   Gilroy Garlics        151035 12 11  7  6

   Sacramento Tomatoes  124  85 15  4  9  1

      If you have some data lines that are shorter than the rest, and you are using column or formatted input, this can cause a problem. If you try to read a name, for example, in columns 60 through 70 when some of the names extend only to column 68, and you didn’t add spaces at the end of the line to fill it out to column 70, then SAS will go to the next line to read the name. To avoid this problem, use the TRUNCOVER option in the INFILE statement (discussed in Section 2.17). For example:

   INFILE 'c:MyRawDataAddresses.dat' TRUNCOVER;

image

11.5   Note: Lost Card

image

Lost card? You thought you were writing SAS programs, not playing a card game. This note makes more sense if you remember that computer programs and data used to be punched out on computer cards. A lost card means that SAS was expecting another line (or card) of data and didn’t find it.

If you are reading multiple lines of raw data for each observation, then a lost card could mean you have missing or duplicate lines of data. If you are reading two data lines for each observation, then SAS will expect an even number of lines in the data file. If you have an odd number, then you will get the lost-card message. It can often be difficult to locate the missing or duplicate lines, especially with large data files. Printing or viewing the SAS data set as well as careful proofreading of the data file can be helpful in identifying problem areas.

Example  The following example shows what can happen if you have a missing line of data. The data values are the normal high and low temperatures and the record high and low for the month of July for each city, but the last city is missing a data line:

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

Here is the SAS log from a program which reads the data, three lines per observation:

1   DATA highlow;

2      INFILE 'c:MyRawDataTemps1.dat';

3      INPUT City $ State $ / NormalHigh NormalLow / RecordHigh RecordLow;

NOTE: The infile 'c:MyRawDataTemps1.dat' is:

      File Name=c:MyRawDataTemps1.dat,

      RECFM=V,LRECL=256

NOTE: LOST CARD.

City=Raleigh State=NC NormalHigh=88 NormalLow=68 RecordHigh=. RecordLow=.

_ERROR_=1 _N_=3

NOTE: 8 records were read from the infile 'c:MyRawDataTemps1.dat'.

      The minimum record length was 5.

      The maximum record length was 10.

NOTE: The data set WORK.HIGHLOW has 2 observations and 6 variables.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.03 seconds

 

In this case, you get the lost-card note, and SAS prints the data values that it read for the observation with the missing data. You can see from the log that SAS read eight records from the file but the SAS data set has only two observations. The incomplete observation was not included.

Example  Often you get other messages along with the lost-card note. The invalid-data note is a common by-product of the lost card. If the second line were missing from the temperature data, then you would get invalid data because SAS would try to read Miami FL as the record high and low for Nome AK.

Nome AK

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105 50

Here is the SAS log showing the invalid-data note:

NOTE: Invalid data for RecordHigh in line 3 1-5.

NOTE: Invalid data for RecordLow in line 3 7-8.

RULE:    ----+----1----+----2----+----3----+----4----+----5----+----6----+

3         Miami FL

City=Nome State=AK NormalHigh=88 NormalLow=29 RecordHigh=. RecordLow=.

_ERROR_=1 _N_=1

NOTE: LOST CARD.

Example  Along with the lost-card note, it is common to get a note indicating that the INPUT statement reached past the end of a line. If you forgot the last number in the file, as in the following example, then you would get these two notes together:

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105

When a program uses list input, SAS will try to go to the next line to get the data for the last variable. Since there isn’t another line of data, you get the lost-card note.

NOTE: LOST CARD.

City=Raleigh State=NC NormalHigh=88 NormalLow=68 RecordHigh=105 RecordLow=. _ERROR_=1 _N_=3

NOTE: 9 records were read from the infile

      'c:MyRawDataTemps3.dat'.

      The minimum record length was 3.

      The maximum record length was 10.

NOTE: SAS went to a new line when INPUT statement reached past the end of

      a line.

NOTE: The data set WORK.HIGHLOW has 2 observations and 6 variables.

image

11.6   Note: Invalid Data

The typical new SAS user, upon seeing the invalid-data note, will ignore it, hoping perhaps that it will simply go away by itself. That’s rather ironic considering that the message is explicit and easy to interpret once you know how to read it.

Interpreting the message  The invalid-data note appears when SAS is unable to read from a raw data file because the data are inconsistent with the INPUT statement. This note almost always indicates a problem. For example, one common mistake is typing in the letter O instead of the number 0. If the variable is numeric, then SAS is unable to interpret the letter O. In response, SAS does two things; it sets the value of this variable to missing and prints out a message like this for the problematic observation:

NOTE: Invalid data for IDNumber in line 8 1-5.

  RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+

   8    0O7  James Bond    SA341

IDNumber=. Name=James Bond class="SA" Q1=3 Q2=4 Q3=1 _ERROR_=1 _N_=8

     The first line tells you where the problem occurred. Specifically, it states the name of the variable SAS got stuck on and the line number and columns of the raw data file that SAS was trying to read. In this example, the error occurred while SAS was trying to read a variable named IDNumber from columns 1 through 5 in line 8 of the input file.

   The next line is a ruler with columns as the increments. The numeral 1 marks the 10th column, 2 marks the 20th column, and so on. Below the ruler, SAS dumps the actual line of raw data so you can see the little troublemaker for yourself. Using the ruler as a guide, you can count over to the column in question. At this point you can compare the actual raw data to your INPUT statement, and the error is usually obvious. The value of IDNumber should be zero-zero-seven, but looking at the line of actual data you can see that a careless typist has typed zero-letter O-seven. Such an error may seem minor to you, but you’ll soon learn that computers are hopelessly persnickety.

    As if this weren’t enough, SAS prints more information: the value of each variable for that observation as SAS read it. In this case, you can see that IDNumber equals missing, Name equals James Bond, and so on. Two automatic variables appear at the end of the line: _ERROR_ and _N_. The _ERROR_ variable has a value of 1 if there is a data error for that observation, and 0 if there is not. In an invalid-data note, _ERROR_ always equals 1. The automatic variable _N_ is the number of times SAS has looped through the DATA step.

Unprintable characters  Occasionally, invalid data contain unprintable characters. In these cases, SAS shows you the raw data in hexadecimal format.

NOTE: Invalid data for IDNumber in line 10 1-5.

   RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+

CHAR  ..    Indiana Jones PI83.

ZONE  20222466666624666725433222222222222222222222222222222222222222222

NUMR  E90009E491E10AFE5300983E00000000000000000000000000000000000000000

IdNumber=. Name=Indiana Jones class="PI" Q1=8 Q2=3 Q3=. _ERROR_=1 _N_=10

    As before, SAS prints the line of raw data that contains the invalid data.

    Directly below the line of raw data, SAS prints two lines containing the hexadecimal equivalent of the data. You needn’t understand hexadecimal values to be able to read this. SAS prints the data this way because the normal 10 numerals and 26 letters don’t provide enough values to represent all computer symbols uniquely. Hexadecimal uses two characters to represent each symbol. To read hexadecimal, take a digit from the first line (labeled ZONE) together with the corresponding digit from the second line (labeled NUMR). In this case, a tab slipped into column 2 and appears as a harmless-looking period in the line of data. In hexadecimal, however, the tab appears as 09, while a real period in column 1 is 2E in hexadecimal. (In z/OS the hexadecimal representation of a tab is 05.)

Possible causes  Common reasons for receiving the invalid-data note include the following:

      character values in a field that should be numeric (including using the letter O instead of the numeral zero)

      forgetting to specify that a variable is character (SAS assumes it is numeric)

      incorrect column specifications producing embedded spaces in numeric data

      list-style input with two periods in a row and no space in between

      missing data not marked with a period for list-style input causing SAS to read the next data value

      special characters such as tab, carriage-return-line-feed, or form-feed in numeric data

      using the wrong informat such as MMDDYY. instead of DDMMYY.

      invalid dates (such as September 31) read with a date informat

Double question mark informat modifier  Sometimes you have invalid data, and there is nothing you can do about it. You know the data are bad, and you just want SAS to go ahead and set those values to missing without filling your log with notes. At those times, you can use the ?? informat modifier. The ?? informat modifier suppresses the invalid-data note, and prevents the automatic variable _ERROR_ from being set to 1. Just insert the two question marks after the name of the problematic variable and before any informat or column specifications. For example, to prevent the preceding invalid-data notes for the variable IdNumber, you would add ?? to the INPUT statement, like this:

INPUT IdNumber ?? 1-5 Name $ 6-18 Class $ 20-21 Q1 22 Q2 23 Q3 24;

 

image

11.7   Note:  Missing Values Were Generated  

image

The missing-values note appears when SAS is unable to compute the value of a variable because of preexisting missing values in your data. This is not necessarily a problem. It is possible that your data contain legitimate missing values and that setting a new variable to missing is a desirable response. But it is also possible that the missing values result from an error and that you need to fix your program or your data. A good rule is to think of the missing-values note as a flag telling you to check for an error.

Example  Here are data from a toad-jumping contest including the toad’s name, weight, and the distance jumped in each of three trials:

Lucky 2.3 1.9 . 3.0

Spot 4.6 2.5 3.1 .5

Toadzilla 7.1 . . 3.8

Hop 4.5 3.2 1.9 2.6

Noisy 3.8 1.3 1.8 1.5

Winner 5.7 . . .

Notice that several of the toads have missing values for one or more jumps. To compute the average distance jumped, the program in the following SAS log reads the raw data, adds together the values for the three jumps, and divides by three:

1    DATA toads;

2       LENGTH ToadName $ 9;

3       INFILE 'c:MyRawDataToadJump.dat';

4       INPUT ToadName Weight Jump1 Jump2 Jump3;

5       AverageJump = (Jump1 + Jump2 + Jump3) / 3;

6    RUN;

NOTE: The infile 'c:MyRawDataToadJump.dat' is:

      Filename=c:MyRawDataToadJump.dat,

      RECFM=V,LRECL=32767, File Size (bytes)=125

NOTE: 6 records were read from the infile 'c:MyRawDataToadJump.dat'.

      The minimum record length was 16.

      The maximum record length was 21.

NOTE: Missing values were generated as a result of performing an

         operation on missing values.

        Each place is given by: (Number of times) at (Line):(Column)

          3 at 5:25

NOTE: The data set WORK.TOADS has 6 observations and 6 variables.

Because of missing values in the data, SAS was unable to compute AverageJump for some of the toads. In response, SAS printed the missing-values note, which has two parts:

       The first part of the note says that SAS was forced to set some values to missing.

   The second part is a bit more cryptic. SAS lists the number of times values were set to missing. This generally corresponds to the number of observations that generated missing values, unless the problem occurs within a DO loop. Next, SAS states where in the program it encountered the problem. In the preceding example, SAS set three values to missing: at line 5, column 25. Looking at the program, you can see that line 5 is the line that calculates AverageJump, and column 25 contains the first plus sign. Looking at the raw data, you can see that three observations have missing values for Jump1, Jump2, or Jump3. Those observations are the three times mentioned in the missing-values note.

Finding the missing values  In this case, it was easy to find the observations with missing values. But if you had a data set with hundreds, or millions, of observations, then you couldn’t just glance at the data. In that case, you could subset the problematic observations with a subsetting IF statement, like this:

DATA missing;

   SET toads;

   IF AverageJump = .;

RUN;

Once you have selected just the observations that have missing values, you can examine them more closely. Here are the observations with missing values for AverageJump:

 

ToadName

Weight

Jump1

Jump2

Jump3

AverageJump

1

Lucky

2.3

1.9

.

3.0

.

2

Toadzilla

7.1

.

.

3.8

.

3

Winner

5.7

.

.

.

.

Using the SUM and MEAN functions  You may be able to circumvent this problem when you are computing a sum or mean by using the SUM or MEAN function instead of an arithmetic expression. In the preceding program, you could remove this line:

AverageJump = (Jump1 + Jump2 + Jump3) / 3;

and substitute this line in its place:

AverageJump = MEAN(Jump1, Jump2, Jump3);

The SUM and MEAN functions ignore missing values by using only nonmissing values in the computation. In this example, you would still get the missing-values note for one toad, Winner, because it had missing values for all three jumps.

image

11.8   Note: Numeric Values Have Been Converted to Character (or Vice Versa)

Even with only two data types, numeric and character, SAS programmers sometimes get their variables mixed up. When you accidentally mix numeric and character variables, SAS tries to fix your program by converting variables from numeric to character or vice versa, as needed. Programmers sometimes ignore this problem, but that is not a good idea. If you ignore this message, it may come back to haunt you as you find new incompatibilities resulting from the fix. If, indeed, a variable needs to be converted, you should do it yourself, explicitly, so you know what your variables are doing.

Example  To show how SAS handles this kind of incompatibility, here are data about a class. Each line of data contains a student’s ID number, name, and scores on two tests.

110 Linda   53 60

203 Derek   72 64

105 Kathy   98 82

224 Michael 80 55

The instructor runs the following program to read the data and create a permanent SAS data set named SCORES.

LIBNAME students 'c:MySASLib';

DATA students.scores;

   INFILE 'c:MyRawDataTestScores.dat';

   INPUT StudentID Name $ Score1 Score2 $;

RUN;

After creating the permanent SAS data set, the instructor runs a program to compute the total score and substring the first digit of StudentID. (Students in section 1 of the class have IDs starting with 1 while students in section 2 have IDs starting with 2.) Here is the log from the program:

2    DATA grades;

3       SET students.scores;

4       TotalScore = Score1 + Score2;

5       Class = SUBSTR(StudentID,1,1);

6    RUN;

NOTE: Character values have been converted to numeric values at the places

      given by:(Line):(Column).

      4:26

NOTE: Numeric values have been converted to character values at the places

      given by:(Line):(Column).

      5:19

NOTE: There were 4 observations read from the data set STUDENTS.SCORES.

NOTE: The data set WORK.GRADES has 4 observations and 6 variables.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.04

This program produces two values-have-been-converted notes. The first conversion occurred in line 4, column 26. Looking at line 4 of the log, you can see that the variable name Score2 appears in column 26. Score2 was accidentally input as a character variable, so SAS had to convert it to numeric before adding it to Score1 to compute TotalScore.

The second conversion occurred in line 5, column 19. Looking at line 5 of the log, you can see that the variable StudentID appears in column 19. StudentID was input as a numeric variable, but the SUBSTR function requires character variables, so SAS was forced to convert StudentID to character.

Converting variables  You could go back and input the raw data with the correct types, but sometimes that’s just not practical. Instead, you can convert the variables from one type to another. To convert variables from character to numeric, you use the INPUT function. To convert from numeric to character, you use the PUT function. Most often, you would use these functions in an assignment statement with the following syntax:

Character to Numeric

Numeric to Character

newvar = INPUT(oldvar, informat);

newvar = PUT(oldvar, format);

These two slightly eccentric functions are first cousins of the PUT and INPUT statements. Just as an INPUT statement uses informats, the INPUT function uses informats; and just as PUT statements use formats, the PUT function uses formats. These functions can be confusing because they are similar but different. In the case of the INPUT function, the informat must be the type you are converting tonumeric. In contrast, the format for the PUT function must be the type you are converting fromnumeric.  To convert the troublesome variables in the preceding program, you would use these statements:

Character to Numeric

Numeric to Character

NewScore2 = INPUT(Score2, 2.);

NewID = PUT(StudentID, 3.);

Here is a log showing the program with the statements to convert Score2 and StudentID. This version of the program runs without any suspicious messages:

7    DATA grades;

8       SET students.scores;

9       NewScore2 = INPUT(Score2, 2.);

10      TotalScore = Score1 + NewScore2;

11      NewID = PUT(StudentID,3.);

12      Class = SUBSTR(NewID,1,1);

13   RUN;

NOTE: There were 4 observations read from the data set STUDENTS.SCORES.

NOTE: The data set WORK.GRADES has 4 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.03 seconds

Note that this section is about converting variables from numeric to character or vice versa, but you can also use the PUT function to change one character value to another character value. When you do that, both oldvar and newvar would be character variables, and the format would be a character format. See Section 4.14 for more about using the PUT function.

image

11.9   DATA Step Produces Wrong Results but No Error Message

image

Some of the hardest errors to debug aren’t errors at all, at least not to SAS. If you do complex programming, you may write a DATA step that runs just finewithout any errors or suspicious notesbut the DATA step produces the wrong results. The more complex your programs are, the more likely you are to get this kind of error. Sometimes it seems like a DATA step is a black box. You know what goes in, and you know what comes out, but what happens in the middle is a mystery. This problem is actually a logic error; somewhere along the way, SAS got the wrong instruction.

Example  Here is a program that illustrates this problem and how to debug it. The raw data below contain information from a class. For each student there are three scores from tests, and one score from homework:

Linda   53 60  66 42

Derek   72 64  56 32

Kathy   98 82 100 48

Michael 80 55  95 50

This program is supposed to select students whose average score is below 70, but it doesn’t work. Here is the log from the wayward program:

1    * Keep only students with mean below 70;

2    DATA lowscore;

3       INFILE 'c:MyRawDataClass.dat';

4       INPUT Name $ Score1 Score2 Score3 Homework;

5       Homework = Homework * 2;

6       AverageScore = MEAN(Score1 + Score2 + Score3 + Homework);

7       IF AverageScore < 70;

8    RUN;

NOTE: The infile 'c:MyRawDataClass.dat' is:

      File Name=c:MyRawDataClass.dat,

      RECFM=V,LRECL=256

NOTE: 4 records were read from the infile 'c:MyRawDataClass.dat'.

      The minimum record length was 20.

      The maximum record length was 20.

NOTE: The data set WORK.LOWSCORE has 0 observations and 6 variables.

First, the DATA step reads the raw data from a file named Class.dat. The highest possible score on homework is 50. To make the homework count the same as a test, the program doubles the value of Homework. Then the program computes the mean of the three test scores and Homework, and subsets the data by selecting only observations with a mean score below 70. Unfortunately, something went wrong. The LOWSCORE data set contains no observations. A glance at the raw data confirms that there should be students whose mean scores are below 70.

Using the PUT and PUTLOG statements to debug  To debug a problem like this, you have to figure out exactly what is happening inside the DATA step. A good way to do thisespecially if your DATA step is long and complexis with PUT or PUTLOG statements. Elsewhere in this book, PUT statements are used along with FILE statements to write raw data files and custom reports. If you use a PUT statement without a FILE statement, then SAS writes in the SAS log. PUTLOG statements are the same except that they always write to the log even when you have a FILE statement. PUT and PUTLOG statements can take many forms, but for debugging, a handy style is:

PUTLOG _ALL_;

SAS will print all the variables in your data set. If you have a lot of variables, you can print just the relevant ones this way:

PUTLOG variable-1=   variable-2=   . . .   variable-n=;

The DATA step below is identical to the one shown earlier except that a PUTLOG statement was added. In a longer DATA step, you might choose to have PUTLOG statements at several points. In this case, one will suffice. This PUTLOG statement is placed before the subsetting IF, since in this particular program the subsetting IF eliminates all observations:

9    * Keep only students with mean below 70;

10   DATA lowscore;

11      INFILE 'c:MyRawDataClass.dat';

12      INPUT Name $ Score1 Score2 Score3 Homework;

13      Homework = Homework * 2;

14      AverageScore = MEAN(Score1 + Score2 + Score3 + Homework);

15      PUTLOG Name= Score1= Score2= Score3= Homework= AverageScore=;

16      IF AverageScore < 70;

17   RUN;

NOTE: The infile 'c:MyRawDataClass.dat' is:

      FILE NAME=c:MyRawDataClass.dat,

      RECFM=V,LRECL=256

Name=Linda Score1=53 Score2=60 Score3=66 Homework=84 AverageScore=263

Name=Derek Score1=72 Score2=64 Score3=56 Homework=64 AverageScore=256

Name=Kathy Score1=98 Score2=82 Score3=100 Homework=96 AverageScore=376

Name=Michael Score1=80 Score2=55 Score3=95 Homework=100 AverageScore=330

NOTE: 4 records were read from the infile 'c:MyRawDataClass.dat'.

      The minimum record length was 20.

      The maximum record length was 20.

NOTE: The data set WORK.LOWSCORE has 0 observations and 6 variables.

Looking at the log, you can see the result of the PUTLOG statement. The data listed in the middle of the log show that the variables are being input properly, and the variable Homework is being adjusted properly. However, something is wrong with the values of AverageScore; they are much too high. There is a syntax error in the line that computes AverageScore. Instead of commas separating the three score variables in the MEAN function, there are plus signs. Since functions can contain arithmetic expressions, SAS simply added the four variables together, as instructed, and computed the mean of a single number. That’s why the values of AverageScore are all above 70.

Depending on the interface you use, you may have another option for finding logic errors: an interactive DATA step debugger. DATA step debuggers allow you to step through your code examining data values at various points inside a DATA step. For more information, check the SAS Documentation for your interface.

image

11.10  Error: Invalid Option, Error: The Option Is Not Recognized, or Error: Statement Is Not Valid

If SAS cannot make sense out of one of your statements, it stops executing the current DATA or PROC step and prints one of these messages:

ERROR 22-7: Invalid option name.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

ERROR 180-322: Statement is not valid or it is used out of proper order.

The invalid-option message and its cousin, the option-is-not-recognized message, tell you that you have a valid statement, but SAS can’t make sense out of an apparent option. The statement-is-not-valid message, on the other hand, means that SAS can’t understand the statement at all. Thankfully, with all three messages SAS underlines the point at which it got confused so you know where to look for the problem.

Example  The SAS log below contains an invalid option:

1    DATA scores (ROP = Score1);

                  ---

                  22

ERROR 22-7: Invalid option name ROP.

2       INFILE 'c:MyRawDataClass.dat';

3       INPUT  Name $ Score1 Score2 Score3 Homework;

4    RUN;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.00 seconds

In this DATA step, the word DROP was misspelled as ROP. Since SAS cannot interpret this, it underlines the word ROP, prints the invalid-option message, and stops processing the DATA step.

Example  The following log contains an option-is-not-recognized message:

5    PROC PRINT

6       VAR Score2;

        ---

        22  

        202

ERROR 22-322:  Syntax error, expecting one of the following: ;, BLANKLINE, CONTENTS,

               DATA, DOUBLE, GRANDTOTAL_LABEL, GRANDTOT_LABEL, GRAND_LABEL,

               GTOTAL_LABEL, GTOT_LABEL, HEADING, LABEL, N, NOOBS, NOSUMLABEL,

               OBS, ROUND, ROWS, SPLIT, STYLE, SUMLABEL, UNIFORM, WIDTH.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

7    RUN;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           0.25 seconds

      cpu time            0.09 seconds

 

SAS underlined the VAR statement. This message may seem puzzling since VAR is not an option, but a statement, and a valid statement at that. But if you look at the previous statement, you will see that the PROC statement is missing one of those pesky semicolons. As a result, SAS tried to interpret the words VAR and Score2 as options in the PROC statement. Since no options exist with those names, SAS stopped processing the step and printed the option-is-not-recognized message. SAS also printed the syntax-error message listing all the valid options for a PROC PRINT statement.

Example  Here is a log with the statement-is-not-valid message:

8    PROC PRINT;

9       SET class;

        ---

        180

ERROR 180-322: Statement is not valid or it is used out of proper order.

10   RUN;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

In this case, a SET statement was used in a PROC step. Since SET statements can be used only in DATA steps, SAS underlines the word SET and prints the statement-is-not-valid message.

Possible causes  Generally, with these error messages, the cause of the problem is easy to detect. You should check the underlined item and the previous statement for possible errors. Possible causes include the following:

      a misspelled keyword

      a missing semicolon

      a DATA step statement in a PROC step (or vice versa)

      a RUN statement in the middle of a DATA or PROC step (this does not cause errors for some procedures)

      the correct option with the wrong statement

      an unmatched quotation mark

      an unmatched comment

image

11.11  Note: Variable Is Uninitialized or Error: Variable Not Found

If you find one of these messages in your SAS log, then SAS is telling you that the variable named in the message (Temp in this case) does not exist:

NOTE: Variable Temp is uninitialized.

WARNING: Variable Temp not found.

ERROR: Variable Temp not found.

Generally, the first time that you get one of these messages, it is quite a shock. You may be sure that the variable does exist. After all, you remember creating it. Fortunately, the problem is usually easy to fix once you understand what SAS is telling you.

If the problem happens in a DATA step, then SAS prints the variable-is-uninitialized note, initializes the variable, and continues to execute your program. Normally, variables are initialized when they are read (via an INPUT, SET, MERGE, or UPDATE statement) or when they are created via an assignment statement. If you use a variable for the first time in a way that does not assign a value to the variable (such as on the right side of an assignment statement, in the condition of an IF statement, or in a DROP or KEEP option), then SAS tries to fix the problem by assigning a value of missing to the variable for all observations. This is very generous of SAS, but it almost never fixes the problem, since you probably don’t want the variable to have missing values for all observations.

When the problem happens in a PROC step, the results are more grave. If the error occurs in a critical statement, such as a VAR statement, then SAS prints the variable-not-found error and does not execute the step. If the error occurs in a less critical statement, such as a LABEL statement, then SAS prints the variable-not-found warning message and attempts to run the step.

Example  Here is the log from a program with missing-variable problems in both a DATA and a PROC step:

1    DATA highscores (KEEP = Name Total);

2       INFILE 'c:MyRawDataTestScores.dat';

3       INPUT StudentID Name $ Score1 Score2;

4       IF Scor1 > 90;

5       Total = Score1 + Score2;

6    RUN;

NOTE: Variable Scor1 is uninitialized.

NOTE: The data set WORK.HIGHSCORES has 0 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.03 seconds

7

 

8    PROC PRINT DATA = highscores;

9       VAR Name Score2 Total;

ERROR: Variable SCORE2 not found.

10   RUN;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           0.03 seconds

      cpu time            0.01 seconds

In this DATA step, the INPUT statement reads four variables: StudentID, Name, Score1, and Score2. But a misspelling in the subsetting IF statement causes SAS to initialize a new variable named Scor1. Because Scor1 has missing values, none of the observations satisfies the subsetting IF, and the data set HIGHSCORES is left with zero observations.

In the PROC PRINT, the VAR statement requests three variables: Name, Score2, and Total. Score2 did exist but was dropped from the data set by the KEEP= option in the DATA statement. That KEEP= option kept only two variables, Name and Total. As a result, SAS prints the variable-not-found error message, and does not execute the PROC PRINT.

Possible causes  Common ways to “lose” variables include the following:

      misspelling a variable name

      using a variable that was dropped at some earlier time

      using the wrong data set

      committing a logic error, such as using a variable before it is created

If the source of the problem is not immediately obvious, a look at the properties of the data set can often help you figure out what is going on. You can examine the properties of a data set using the Properties window or PROC CONTENTS. Both of these give you information about what is in a SAS data set, including variable names. To open a Properties window, right-click the icon for a data set and select Properties from the pop-up menu. PROC CONTENTS is covered in Section 2.3.

 

image

11.12  SAS Truncates a Character Variable

image

Sometimes you may notice that some, or all, of the values of a character variable are truncated. You may be expecting “peanut butter” and get “peanut b” or “chocolate ice cream” and get “chocolate ice.” This usually happens when you use IF statements to create a new character variable, or when you are using list-style input and you have values longer than eight characters. All character variables have a fixed length determined by one of the following methods.

INPUT statement  If you are using an INPUT statement with list-style input, then the length defaults to 8. If you are using column or formatted input, then the length is determined by the number of columns, or the informat. Here are examples of INPUT statements that read a variable named Food, and the resulting lengths:

INPUT statement

Length of Food

INPUT Food $;

8

 

INPUT Food $ 1-10;

10

 

INPUT Food $15.;

15

 

Assignment statement  If you are creating the variable in an assignment statement, then the length is determined by the first occurrence of the new variable name. For example, the following program creates a variable, Status, whose values depend on the value of the variable Temperature:

DATA summer;

   SET temps;

   IF Temperature > 100 THEN Status = 'Hot';

      ELSE Status = 'Cold';

RUN;

Because the word Hot has three characters and this is the first time the variable Status is used, SAS gives Status a length of 3. Any other values for this variable would be truncated to three characters (Col instead of Cold, for example).

LENGTH statement  The LENGTH statement in a DATA step defines variable lengths and, if it comes before an INPUT or assignment statement, will override either of the previous two methods of determining length. The following LENGTH statement sets the length of the variable Status to 4 and the variable Food to 15:

LENGTH Status $4 Food $15;

ATTRIB statement  You can also assign variable lengths in an ATTRIB statement in a DATA step where you can associate formats, informats, labels, and lengths with variables in a single statement. Always place the LENGTH option before a FORMAT option in an ATTRIB statement to ensure that the variables are assigned proper lengths. For example, the following statement creates a character variable named Status with a length of 4 and the label Hot or Cold:

ATTRIB Status LENGTH = $4 LABEL = 'Hot or Cold';

Example  This example shows what can happen if you let SAS determine the length of a character variable (in this case, using an assignment statement). Here are data for a consumer survey of car color preferences. Age is followed by sex (coded as 1 for male and 2 for female), annual income, and preferred car color (yellow, gray, blue, or white):

Age,Sex,Income,Color

19,1,28000,Y

45,1,130000,G

72,2,70000,B

31,1,88000,Y

58,2,166000,W

In the following program, a series of IF-THEN/ELSE statements create a variable named AgeGroup based on the value of Age.

DATA carsurvey;

   INFILE 'c:MyRawDataCars.csv' DLM = ',' FIRSTOBS = 2;

   INPUT Age Sex Income Color $;

   IF Age < 20 THEN AgeGroup = 'Teen';

      ELSE IF Age < 65 THEN AgeGroup = 'Adult';

      ELSE AgeGroup = 'Senior';

RUN;

Here is the CARSURVEY data set. Notice that the values of AgeGroup are truncated to four characters—the number of characters in Teen.

 

Age

Sex

Income

Color

AgeGroup

1

19

1

28000

Y

Teen

2

45

1

130000

G

Adul

3

72

2

70000

B

Seni

4

31

1

88000

Y

Adul

5

58

2

166000

W

Adul

Adding a LENGTH statement to the DATA step will eliminate the truncation problem:

DATA carsurvey;

   INFILE 'c:MyRawDataCars.csv' DLM = ',' FIRSTOBS = 2;

   INPUT Age Sex Income Color $;

   LENGTH AgeGroup $6;

   IF Age < 20 THEN AgeGroup = 'Teen';

      ELSE IF Age < 65 THEN AgeGroup = 'Adult';

      ELSE AgeGroup = 'Senior';

RUN;

Here is the new data set with untruncated values for AgeGroup:

 

Age

Sex

Income

Color

AgeGroup

1

19

1

28000

Y

Teen

2

45

1

130000

G

Adult

3

72

2

70000

B

Senior

4

31

1

88000

Y

Adult

5

58

2

166000

W

Adult

11.13  Saving Memory or Disk Space

What do you do when you finally get your program working, and it seems to take forever to run, or—even worse—you get a message that your computer is out of memory or disk space? Well, you could petition to buy a more powerful computer, which isn’t really such a bad idea, but there are a few things you can try before resorting to spending money. Because this issue depends on your operating environment, it is not possible to cover everything you might be able to do in this section. However, this section describes a few universal actions you can take to remedy the situation.

It is helpful, in trying to solve the problem, to know why it happens. Usually, when you run out of memory, it’s when you are doing some pretty intensive computations or sorting data sets with lots of variables. The GLM procedure (General Linear Models), for example, can use lots of memory when your model is complicated and there are many levels for each classification variable. You run out of disk space because SAS uses disk space to store all its temporary working files, including temporary SAS data sets, and the SAS log and output. If you are creating many large temporary SAS data sets during the course of a SAS session, this can quickly fill up your disk space.

Deleting unneeded data sets  SAS automatically deletes data sets in the WORK library when you your job or session ends. But if you have large temporary data sets, you can free up disk space by deleting them as soon as you are finished with them. To do this, use PROC DELETE with this general form:

PROC DELETE DATA = data-set-name;

Reducing the storage size of variables  If your files are too big, one thing you can do is decrease the number of bytes needed to store individual variables. This can also help memory problems that arise when sorting data sets with character data. Since all numbers are expanded to the fullest precision (eight bytes) while SAS is processing data, changing storage requirements for numeric data will not help memory problems. If memory or disk space is at a premium, you can usually find some variables that require fewer bytes.

For character data, each character requires one byte of storage. The length of a character variable is determined when it is created. If you are using list input, then by default, variables are given a length of eight. If your data are only one character long, Y or N for example, then you are using eight times the storage space you actually need. You can use a LENGTH or ATTRIB statement in a DATA step to change the variable’s length. For example, the following gives the character variable Answer a length of one byte:

LENGTH Answer $1;

Be sure to put the LENGTH or ATTRIB statement before any other statements that refer to the variable.

If you are running out of disk space, in addition to shortening the lengths of character variables, you may also be able to decrease the lengths of numeric variables. Numeric data are a little trickier than character when it comes to length. All numbers can be safely stored in eight bytes, and that’s why eight is the default. Some numbers can be safely stored in fewer bytes, but which numbers depends on your operating environment. Check the SAS Documentation for your operating environment to determine the length and precision of numeric variables. Under Windows and UNIX, for example, you can safely store integers up to 8,192 in three bytes. In general, if your numbers contain decimal values, then you must use eight bytes. If you have small integer values, then you can use four bytes (in some operating environments two or three bytes). Use the LENGTH statement to change the lengths of data:

LENGTH Tigers 4;

This statement changes the length of the numeric variable Tigers to four bytes. If your numbers are categorical, like 1 for male and 2 for female, then you can read them as character data with a length of 1 and save even more space.

Reducing the number of observations  If you are going to use only a fraction of your data in a DATA step, then subset as soon as possible using a subsetting IF statement, WHERE statement, or WHERE= data set option. If you are using a procedure, you may be able to skip a DATA step entirely by subsetting in the procedure using a WHERE statement or WHERE= data set option. This PROC PRINT, for example, uses a WHERE statement to subset:

PROC PRINT DATA = survey;

   WHERE Sex = 'female';

RUN;

Reducing the number of variables  If you need only a few of the variables in your data set, then use the KEEP= (or DROP=) data set option (Section 6.10). For example, if you had a data set containing information about all the zoo animals, but you wanted to look at only the lions and tigers, then you could use the following statements:

DATA partial;

   SET zooanimals (KEEP = Lions Tigers);

RUN;

Compressing data sets  It is also possible to compress SAS data sets. Compressing may save space if your data have many repeated values. But beware, compressing can in some cases actually increase the size of your data set. Fortunately, SAS prints a message in your log telling you the change in size of your data sets. You can turn on compression by using either the COMPRESS=YES system option, or the COMPRESS=YES data set option. Use the system option if you want all the SAS data sets that you create to be compressed. Use the data set option when you want to control which SAS data sets to compress. For example:

DATA compressedzooanimals (COMPRESS = YES);

  SET zooanimals;

RUN;

Memory  If memory is your problem, then do what you can to eliminate other programs that are using your computer’s memory. If you are using an interactive environment to run your SAS programs, try running in batch mode instead. Also, see the SAS Documentation for your operating environment for potential ways to make more memory available on your system.

If you have tried all of the above, and you are still running out of memory or disk space, then you can always try finding a more powerful computer. One of the nice things about SAS is that the language is the same for all operating environments. To move your program to another operating environment, you would need to change only a few statements like INFILE or LIBNAME, which deal directly with the operating environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset