Chapter 4: Creating Permanent SAS Data Sets

4.1  Introduction

4.2  SAS Libraries—The LIBNAME Statement

4.3  Why Create Permanent SAS Data Sets?

4.4  Examining the Descriptor Portion of a SAS Data Set Using PROC CONTENTS

4.5  Listing All the SAS Data Sets in a SAS Library Using PROC CONTENTS

4.6  Viewing the Descriptor Portion of a SAS Data Set Using a Point-and-Click Approach

4.7  Viewing the Data Portion of a SAS Data Set Using PROC PRINT

4.8  Using a SAS Data Set as Input to a DATA Step

4.9  DATA _NULL_: A Data Set That Isn’t

4.10  Problems

 

4.1  Introduction

SAS procedures cannot read raw data files or spreadsheets directly. One way or another, they need the data in SAS data sets. Remember that SAS DATA steps can create SAS data sets. You can also have SAS convert data from other sources, such as Microsoft Office Excel or Access, Oracle, or DB2. This conversion process can be automated by using automated Import facilities in the Display Manager, SAS Enterprise Guide, or SAS Studio (in the University Edition), or by using data access engines, which automatically convert the data into a form SAS can process.

This chapter describes how to make your SAS data set permanent and how to determine the contents of a SAS data set.

4.2  SAS Libraries—The LIBNAME Statement

When you write a DATA statement such as

data Test;

 

SAS creates a temporary SAS data set called Test. When you close your SAS session, this data set disappears. SAS data set names actually have two-part names in the form:

libref.data-set-name

The part of the name before the period is called a libref (short for library reference), and this tells SAS where to store (or retrieve) the data set. The part of the name after the period identifies the name you want to give the data set.

Up until now, all the programming examples in this book used a data set name without a period. When you use a name like Test in the DATA statement, SAS uses a default libref called Work that SAS creates automatically every time you open a SAS session. For example, if you write a DATA statement such as

data Test;

SAS adds the default libref Work, so this DATA statement is equivalent to

data Work.Test;

All that is required to make your SAS data sets permanent is to create your own libref using a LIBNAME statement and use that libref in the two-level SAS data set name.

Suppose you want to create a permanent SAS data set called Test_Scores in your C:ookslearning folder. You could use following program.

Program 4.1: Creating a Permanent SAS Data Set

  libname Mozart 'C:ookslearning';

  data Mozart.Test_Scores;

     length ID $ 3 Name $ 15;

     input ID $ Score1-Score3 Name $;

  datalines;

  1 90 95 98 Jan

  2 78 77 99 Preston

  3 88 91 92 Russell

  ;

The LIBNAME statement starts with the keyword LIBNAME and then specifies the name of the library (called a libref), followed by the directory or folder where you want to store your permanent SAS data sets. The libref you use must not be more than 8 characters in length and must be a valid SAS name.

When you run this program, data set Test_Scores becomes a permanent SAS data set in the C:ookslearning folder. It is important to remember that any libref that you create exists only for your current SAS session. If you open a new SAS session, you need to reissue that LIBNAME statement. A good way to think of a libref is as an alias for the name of the folder (on Windows or UNIX platforms). On mainframe computers, a SAS library is actually a single file that can hold multiple SAS data sets.

If you run Program 4.1 on a Windows platform, the SAS data set will be stored as the SAS data set called Test_Scores, and it will be stored as the file Test_Scores.sas7bdat in the C:ookslearning folder. The file extension stands for SAS binary data version 7. You may wonder why there is a 7 rather than a 9 in the file extension when this data set was created using SAS9. Since the structure of SAS data sets has not changed since SAS 7, SAS has maintained the same file extension it used in SAS 7.

4.3  Why Create Permanent SAS Data Sets?

If your data sets are small, you may choose to create them each time you start a SAS session. However, it takes computer resources to create SAS data sets and it makes more sense to make your data sets permanent if you plan to use them more than once, especially if they are large.

4.4  Examining the Descriptor Portion of a SAS Data Set Using PROC CONTENTS

A SAS data set consists of two parts: a descriptor portion and a data portion. One way to examine the descriptor portion of a SAS data set is by using PROC CONTENTS. If you want to see the contents of the descriptor portion of the Test_Scores data set, submit the following program:

Program 4.2: Using PROC CONTENTS to Examine the Descriptor Portion of a SAS Data Set

title "The Descriptor Portion of Data Set TEST_SCORES";

proc contents data=Mozart.Test_Scores;

run;

The resulting output is shown next:

Figure 4.1: Output from Program 4.2

Figure 4.1: Output from Program 4.2

image shown here

image shown here

This output displays information about the data set, such as the number of variables, the number of observations, and the creation and modification dates. It also displays information about the SAS version used to create the data set. In addition, it displays information on each of the variables in the data set—the variable name, type, and storage length.

A quick look at this output shows that the data set Test_Scores has 3 observations and 5 variables. The list of variables is in alphabetical order. It shows that ID and Name are character variables stored in 3 and 15 bytes, respectively; the three Score variables are numeric and are stored in 8 bytes each.

The title at the top of the page is created by using a TITLE statement as shown in Program 4.2. You may use either single or double quotes to enclose your title. If the title contains any single quotation marks (or apostrophes), you should use double quotation marks.

Note: If your title does not contain any apostrophes or macro variable names (see Chapter 25), you can actually omit the quotation marks altogether. However, as a matter of style, you may want to use quotation marks on all your TITLE statements.

A more useful way to list variable information is to list them in the order the variables are stored in the SAS data set, rather than alphabetically. To create such a list, use the VARNUM option of PROC CONTENTS, like this:

Program 4.3: Demonstrating the VARNUM option of PROC CONTENTS

  title "The Descriptor Portion of Data Set Test_Scores";

  proc contents data=Mozart.Test_Scores varnum;

  run;

Output from this program is identical to the previous output except that the variable list is now in the same order as the variables in the data set. This portion of the output is shown below to demonstrate the effect of this option:

Figure 4.2: Demonstrating the VARNUM Option of PROC CONTENTS

Figure 4.2: Demonstrating the VARNUM Option of PROC CONTENTS

It is important to remember that if you have just opened a new SAS session, you must reissue a LIBNAME statement if you want to access a previously created SAS data set or to create a new one. You may use any library name (libref) you want each time you open a SAS session, although in practice you usually use the same library reference each time.

For example, if you open up a new SAS session, you can submit the following statements to obtain information on the Test_Scores data set:

Program 4.4: Using a LIBNAME in a New SAS Session

  libname Proj99 'C:ookslearning';

  title "Descriptor Portion of Data Set Test_Scores";

  proc contents data=Proj99.Test_Scores varnum;

  run;

Note: One other useful PROC CONTENTS option is POSITION. This option produces both an alphabetical and ordered list of the variables in your SAS data set, which is this author’s favorite option.

4.5  Listing All the SAS Data Sets in a SAS Library Using PROC CONTENTS

You can use PROC CONTENTS to list the names of all the SAS data sets in a SAS library (folder). To do this, use the following program:

Program 4.5: Using PROC CONTENTS to List the Names of all the SAS Data Sets in a SAS Library

title "Listing All the SAS Data Sets in a Library";

proc contents data=Mozart._all_ nods;

run;

The keyword _ALL_ is used in place of a data set name. The NODS option gives you the name of the SAS data sets only, omitting the detail listing for each data set. A partial listing is shown next:

Figure 4.3: Partial Listing from Program 4.5

Figure 4.3: Partial Listing from Program 4.5

4.6  Viewing the Descriptor Portion of a SAS Data Set Using a Point-and-Click Approach

If you are running SAS using any of the three windowing environments, you can use simple point-and-click methods to view either the descriptor or data portion of a SAS data set. The screen shots that are displayed  below were made using the SAS windowing environment on a Windows platform. Similar techniques can be used with Enterprise Guide or SAS Studio.

 

First, click the Explorer tab on the bottom left of your editor window:

Figure 4.4: Using a Point-and-Click Approach to Investigate Your SAS Data Set

Figure 4.4: Using a Point-and-Click Approach to Investigate Your SAS Data Set

This brings up the following window:

Figure 4.5: The LIBRARY Icon

Figure 4.5: The LIBRARY Icon

The Libraries icon shows the built-in libraries plus any libraries you have created using LIBNAME statements. Double-click the Libraries icon.

Figure 4.6: Select the Library You Want to Inspect

Figure 4.6: Select the Library You Want to Inspect

Selecting a library enables you to see all the SAS data sets stored there.

Figure 4.7: Partial List of SAS Data Sets in the Mozart Library

Figure 4.7: Partial List of SAS Data Sets in the Mozart Library

A right-click on the data set icon brings up a menu that includes a choice to see the variables (columns) in the data set and their attributes. A left-click opens the SAS Viewer, showing the data portion of your data set. If you right-click and select View Columns, you will see the following:

Figure 4.8: Selecting View Columns to See Your Variables and Attributes

Figure 4.8: Selecting View Columns to See Your Variables and Attributes

This shows the same information that you can obtain by running PROC CONTENTS. The order of the variables in the list is the same as the order you will see when using the VARNUM option.

If you left-click the data set icon, you will see your data set in the SAS Viewer. It looks like this:

Figure 4.9: Using the SAS Viewer to List Your SAS Data Set

Figure 4.9: Using the SAS Viewer to List Your SAS Data Set

Note: Be sure to close this window if you plan to make changes to the data set. If the viewer is still open, you will be prevented from making any changes to the data set.

Similar resources are available in Enterprise Guide and SAS Studio.

4.7  Viewing the Data Portion of a SAS Data Set Using PROC PRINT

As you have seen in several programs so far in this book, PROC PRINT can be used to list the data in a SAS data set. Although there are a number of options to control how this listing appears, you can use it with all the defaults to get a quick listing of your data set. Here is the code to list the data portion of data set Test_Scores:

Program 4.6: Using PROC PRINT to List the Data Portion of a SAS Data Set

  title "Listing of Test_Scores";

  proc print data=Mozart.Test_Scores;

  run;

 

This code generates the following output:

Figure 4.10: Output from Program 4.6

Figure 4.10: Output from Program 4.6

This listing displays all the variables and all the observations in the Test_Scores data set.

Program 4.6 is an example of a procedure that uses all the default actions. That is, you did not specify any details such as which variables to print or other controllable aspects of this procedure. Chapter 14 describes how to add options and statements to PROC PRINT to customize your report.

4.8  Using a SAS Data Set as Input to a DATA Step

Besides raw data files, SAS data sets can also be used as input to a DATA step. As an example, you might want to use the information in an existing SAS data set to compute new variables.

Consider the data set Test_Scores (stored in the C:ookslearning folder). This data set contains the variables ID, Name, and Score1–Score3 (three test scores). Suppose you want to compute an average score for each subject in this data set. Program 4.7 performs this task:

Program 4.7: Using Observations from a SAS Data Set as Input to a New SAS Data Set

  libname Learn 'C:ookslearning';

  data New;

     set Learn.Test_Scores;

     AveScore = mean(of Score1-Score3);

  run;

  title "Listing of Data Set New";

  proc print data=New;

     var ID Score1-Score3 AveScore;

  run;

The key to this program is the SET statement. You can think of a SET statement as an INPUT statement except you are reading observations from a SAS data set instead of lines from a raw data file. There is a difference, however. Each time you read a line of data from a raw data file, the variables being read from the raw data file or created by assignment statements in the DATA step are initialized to a missing value during each iteration of the DATA step. Variables that are read from SAS data sets are not set to missing values during each iteration of the DATA step—they are said to be retained. In Program 4.7, the variables ID, Name, and Score1–Score3 are retained; the variable AveScore is not. This fact is not a concern to us here, but it can be used to advantage in more advanced programs.

The assignment statement that creates the AveScore variable uses the MEAN function to compute the mean of the three Score variables. You can read more about the MEAN function in Chapter 11. For now, you should notice that the variable list Score1–Score3 is preceded by the word of. This is typical of many SAS statistical functions that can take a variable list as an argument. Without the word of, the MEAN function would return the difference of Score1 and Score3 (that is, the dash would be interpreted as a minus sign).

Here is a listing of data set New:

Figure 4.11: Output from Program 4.7

Figure 4.11: Output from Program 4.7

4.9  DATA _NULL_: A Data Set That Isn’t

There are many applications where you want to process observations in a SAS data set, perhaps to print out data errors or to produce a report, and you don’t need to create a new data set.

You can use the data set name _NULL_ for these applications. The reserved data set name _NULL_ tells SAS not to create a data set. It enables you to process observations from raw data or an existing data set without the overhead of creating a new data set. Here is an example.

You have a permanent SAS data set (Test_Scores) and you want to create a list of all the IDs of students who achieved a score of 95 or higher on any of the tests. You could create a new SAS data set and use PROC PRINT to list these students or you could do it more efficiently with a DATA _NULL_ step, like this:

Program 4.8: Demonstrating a DATA _NULL_ Step

  title "Scores Greater Than or Equal to 95";

  data _null_;

     set Learn.Test_Scores;

     if Score1 ge 95 or Score2 ge 95 or Score3 ge 95 then

        put ID= Score1= Score2= Score3=;

  run;

The IF statement checks if any of the three test scores is greater than or equal to 95. If so, the PUT statement writes out the values of ID and the three test scores. A PUT statement writes text to a location of your choice: an external text file, the SAS log, or the OUTPUT window. In Program 4.8, an output location is not specified so the default location, the SAS log, is used. Here is a listing of the SAS log after running this program:

Figure 4.12: Listing of the SAS Log After Running Program 4.8

Figure 4.12: Listing of the SAS Log After Running Program 4.8

Placing PUT statements in a DATA step is an excellent way to help debug SAS programs. You can examine the values of your variables at any place in the DATA step. (You can also use the SAS debugger, available on the PC platform for this purpose.)

If you want to send the output to a file called C:ookslearningHighScores.txt, you would need to place a FILE statement before the PUT statement, as follows:

file 'C:ookslearningHighScores.txt';

A FILE statement is somewhat like an INFILE statement—that is, it works in concert with a PUT statement, telling SAS the destination of the text you are outputting.

If you want the results of the PUT statement to be written to the output device (on a PC, this would be the OUTPUT window), you can use the reserved file reference PRINT, like this:

file print;

DATA _NULL_ steps are sometimes used to create custom reports. As a matter of fact, this type of report is referred to as DATA _NULL_ reporting. To control how SAS writes this output, you can use pointers and formats to specify exactly what columns to write to and how the values are to be formatted. With more powerful procedures such as PROC REPORT, writing your own custom report using a DATA _NULL_ DATA step is rarely needed.

 

4.10  Problems

Solutions to odd-numbered problems are located at the back of this book. Solutions to all problems are available to professors. If you are a professor, visit the book’s companion website at support.sas.com/cody for information about how to obtain the solutions to all problems.

1.        Run the program here to create a permanent SAS data set called Perm. You will need to modify the program to specify a folder where you want to place this data set. Run PROC CONTENTS on this data set.

libname learn 'C:your-folder-name';

   data learn.Perm;

      input ID : $3. Gender : $1. DOB : mmddyy10.

            Height Weight;

      label DOB = 'Date of Birth'

            Height = 'Height in inches'

            Weight = 'Weight in pounds';

      format DOB date9.;

   datalines;

   001 M 10/21/1946 68 150

   002 F 5/26/1950 63 122

   003 M 5/11/1981 72 175

   004 M 7/4/1983 70 128

   005 F 12/25/2005 30 40

   ;

2.       Run PROC PRINT on the data set you created in Problem 1. Use the SAS VIEWTABLE window (if available on your system) to open this data set and compare the headings in the window to the column headings from your PROC PRINT. What is the difference?

3.       Run this program to create a permanent SAS data set called Survey2018. Close your SAS session, open up a new session, and write the statements necessary to compute the mean of age.

 *Write your LIBNAME statement here;

 data <fill in your data set name here> ;

    input Age Gender $ (Ques1-Ques5)($1.);

    /* See Chapter 21, Section 14 for a discussion

       of variable lists and format lists used above */

 datalines;

 23 M 15243

 30 F 11123

 42 M 23555

 48 F 55541

 55 F 42232

 62 F 33333

 68 M 44122

;

*Write this code after you closed and reopened your SAS session;

*Write your libname statement here;

proc means data= - insert the correct data set name -;

   var Age;

run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset