Chapter 1. An Introduction to SAS/IML Software

Contents

  • 1.1 Overview of the SAS/IML Language 3

  • 1.2 Comparing the SAS/IML Language and the DATA Step 5

  • 1.3 Overview of SAS/IML Software 6

    • 1.3.1 Overview of the IML Procedure 6

    • 1.3.2 Running a PROC IML Program 7

    • 1.3.3 Overview of SAS/IML Studio 8

    • 1.3.4 Installing and Invoking SAS/IML Studio 10

    • 1.3.5 Running a Program in SAS/IML Studio 11

    • 1.3.6 Using SAS/IML Studio for Exploratory Data Analysis 12

  • 1.4 Who Should Read This Book? 12

  • 1.5 Overview of This Book 13

  • 1.6 Possible Roadmaps through This Book 14

  • 1.7 How to Read the Programs in This Book 14

  • 1.8 Data and Programs Used in This Book 15

    • 1.8.1 Installing the Example Data on a Local SAS Server 16

    • 1.8.2 Installing the Example Data on a Remote SAS Server 16

1.1 Overview of the SAS/IML Language

The acronym IML stands for "interactive matrix language." The SAS/IML language enables you to read data into vectors and matrices and to manipulate these quantities by using high-level matrix-vector computations. The language enables you to formulate and solve mathematical and statistical problems by using functions and expressions that are similar to those found in textbooks and in research journals. You can write programs that analyze and visualize data or that implement custom algorithms that are not built into any SAS procedure.

The SAS/IML language contains over 300 built-in functions and subroutines. There are also hundreds of functions in Base SAS software that you can call. These functions provide the building blocks for writing statistical analyses. You can write SAS/IML programs by using either of two SAS products: the IML procedure (also called PROC IML) and the SAS/IML Studio application. These two products are discussed in Section 1.3.

As implied by the IML acronym, matrices are a fundamental part of the SAS/IML language. A matrix is a rectangular array of numbers or character strings. In the IML procedure, all variables are matrices. Matrices are used to store many kinds of information. For example, each row in a data matrix represents an observation, and each column represents a variable. In a variance-covariance matrix, the ij th entry represents the sample covariance between the i th and j th variable in a set of data.

As an example of the power and convenience of the SAS/IML language, the following PROC IML statements read certain numeric variables from a data set into a matrix, x. The program then computes robust estimates of location and scale for each variable. (The location parameter identifies the value of the data's center; the scale parameter tells you about the spread of the data.) Each computation requires only a single statement. The SAS/IML statements are described in Chapter 2, "Getting Started with the SAS/IML Matrix Programming Language." The data are from a sample data set in the Sashelp library that contains age, height, and weight information for a class of students.

/* standardize data by using robust estimates of center and scale   */
proc iml;
use Sashelp.Class;              /* open data set for reading        */
read all var _NUM_ into x[colname=VarNames];  /* read variables     */
close Sashelp.Class;            /* close data set                   */

/* estimate centers and scales of each variable */
c = median(x);                  /* centers = medians of each column */
s = mad(x);                     /* scales = MAD of each column      */
stdX = (x - c) / s;             /* standardize the data             */

print c[colname=varNames];      /* print statistics for each column */
print s[colname=varNames];
Robust Estimates of Location and Scale

Figure 1.1. Robust Estimates of Location and Scale

In the PROC IML program, the location of the center of each variable is estimated by calling the MEDIAN function. The scale of each variable is estimated by calling the MAD function, which computes the median absolute deviation (MAD) from the median. (The median is a robust alternative to the mean; the MAD is a robust alternative to the standard deviation.) The data are then standardized (that is, centered and scaled) by subtracting each center from the variable and dividing the result by the scale for that variable. See Section 2.12 for details on how the SAS/IML language interprets quantities such as (x-c)/s.

The previous program highlights a few features of the SAS/IML language:

  • You can read data from a SAS data set into a matrix.

  • You can pass matrices to functions.

  • Many functions act on the columns of a matrix by default.

  • You can perform mathematical operations on matrices and vectors by using a natural syntax.

  • You can analyze data and compute statistics without writing loops. Notice in the program that there is no explicit loop over observations, nor is there a loop over the variables.

In general, the SAS/IML language enables you to create compact programs by using a syntax that is natural and convenient for statistical computations. The language is described in detail in Chapter 2, "Getting Started with the SAS/IML Matrix Programming Language."

1.2 Comparing the SAS/IML Language and the DATA Step

The statistical power of SAS procedures and the data manipulation capabilities of the DATA step are sufficient to serve the analytical needs of many data analysts. However, sometimes you need to implement a proprietary algorithm or an algorithm that has recently been published in a professional journal. Other times, you need to use matrix computations to combine and extend results from procedures. In these situations, you can write a program in the SAS/IML language.

The syntax of the SAS/IML language has much in common with the DATA step: neither language is case-sensitive, variable names can contain up to 32 characters, and statements must end with a semicolon. Furthermore, the syntax for control statements such as the IF-THEN/ELSE statement and the iterative DO statement is the same for both languages. The two languages use the same symbols to test a quantity for equality (=), inequality (^=), and to compare quantities (for example, <=). The SAS/IML language enables you to call the same mathematical functions provided in the DATA step, such as LOG, SQRT, ABS, SIN, COS, CEIL, and FLOOR.

Conceptually, there are two main differences between the DATA step and a SAS/IML program. First, the DATA step implicitly loops over all observations, whereas a typical SAS/IML program does not. Second, the fundamental unit in the DATA step is an observation, whereas the fundamental unit in the SAS/IML language is a matrix.

In general, SAS/IML software is intended for statistical computing, whereas the DATA step is best for merging, extracting, and transforming data. This is a gross oversimplification, and the author has seen some impressive statistical computations accomplished with the DATA step! However, in many cases, those computations would have been shorter to read and simpler to execute if they had been written in SAS/IML software. The reason is simple: SAS/IML functions can use all of the data to compute statistics, whereas the DATA step functions process one observation at a time. For example, computing the sample standard deviation of a variable in the DATA step is more difficult than computing the same quantity in PROC IML.

1.3 Overview of SAS/IML Software

To run SAS/IML programs, you need to use SAS/IML software. The software consists of two components: the IML procedure and the SAS/IML Studio application.

The IML procedure has been a part of the SAS System since Version 6 in the early 1980s. Traditionally, SAS programmers have used PROC IML to implement computational algorithms that are not built into any SAS procedure.

SAS/IML Studio is newer. It is a programming environment for developing, running, and debugging SAS/IML programs. It includes the ability to call SAS procedures, DATA steps, and macro functions from within SAS/IML programs. It also includes dynamically linked statistical graphics. SAS/IML Studio is useful both for developing programs and also for running programs that exploit interactive features of the software.

SAS/IML Studio has undergone several name changes. The product was initially released as a Web download in 2001 under the name "SAS/IML Workshop." In 2007, the product was renamed "SAS Stat Studio" and was distributed as part of the SAS/IML product in SAS 9.2. The product was renamed SAS/IML Studio in July 2009 to better emphasize its relationship to the SAS/IML programming language.

SAS/IML software continues to evolve. This book describes features of the IML procedure that are available in SAS 9.2 and mentions new features that are available in the 9.22 release of SAS/IML software. This book describes features of SAS/IML Studio 3.3.

1.3.1 Overview of the IML Procedure

The IML procedure implements the SAS/IML language. PROC IML is an interactive procedure in the sense that each statement is executed as it is submitted. You can submit several statements, examine the results, submit more statements, and so on. When you define a matrix, it persists until you quit the procedure, free the memory, or redefine it. This is different from many other SAS procedures, which do not execute any statements until a RUN statement is submitted. The RUN statement in the SAS/IML language is used to execute a module or subroutine, so do not attempt to use the RUN statement as the last statement in a PROC IML call! To exit from PROC IML, use the QUIT statement.

You need a license for the SAS/IML product in order to run PROC IML. PROC IML executes on a SAS server.

1.3.2 Running a PROC IML Program

There are several ways to run a SAS/IML program. The traditional way to run PROC IML is to use the SAS Display Manager. You type a program into the Enhanced Editor and choose Runl1.3.2 Running a PROC IML Program Submit from the main menu. You can also press F3 to run a program.

A second traditional way is to use SAS Enterprise Guide. You type a program into a Program window and click Run on the Program menu.

In both of these environments, you need to begin your program with the PROC IML statement, as shown in the following program:

/* convert temperatures from Celsius to Fahrenheit scale */
proc iml;
Celsius = {-40, 0, 20, 37, 100};   /* some interesting temperatures */
Fahrenheit = 9/5 * Celsius +32;    /* convert to Fahrenheit scale   */
print Celsius Fahrenheit;
Some Temperatures in the Celsius and Fahrenheit Scales

Figure 1.2. Some Temperatures in the Celsius and Fahrenheit Scales

The PRINT statement displays the value of variables to the current output destination (such as the SAS LISTING destination). You can submit additional statements that use the results of previous assignment statements, as shown in the following statements:

Kelvin = Celsius + 273.15;         /* convert to Kelvin scale    */
print Kelvin;
Some Temperatures in the Kelvin Scale

Figure 1.3. Some Temperatures in the Kelvin Scale

The variable Celsius, defined in the first set of statements, remains available until you quit the procedure or free the variable. When you use the IML procedure in this way, you are essentially using it as a massively powerful calculator.

1.3.3 Overview of SAS/IML Studio

SAS/IML Studio is a programming environment for developing, running, and debugging SAS/IML programs. It is available for no additional fee when you license SAS/IML and SAS/STAT software.

The programming language of SAS/IML Studio is called IMLPlus. IMLPlus is an extension of the SAS/IML language that contains additional programming features. IMLPlus combines the flexibility of programming in the SAS/IML language with the power to call SAS procedures and to create and modify dynamically linked statistical graphics. Consequently, IMLPlus contains all of the capabilities of PROC IML, but it can do much more. The IMLPlus language is described in Chapter 5, "IMLPlus: Programming in SAS/IML Studio."

SAS/IML Studio is designed to serve the needs of SAS programmers who need a rich programming environment in which to develop algorithms, to explore data, to investigate relationships between variables, to formulate and compare statistical models, and to detect and understand outliers in the data.

Figure 1.4 shows a SAS/IML Studio workspace with a program window, an output window, and two dynamically linked graph windows. The SAS/IML environment contains many features that assist the programmer in developing programs. For example, the environment is ideal for the programmer who loves to multitask. The environment is multithreaded, which means that you can develop and run several programs simultaneously. Each program environment (called a workspace) has its own program window, output window, error log, and windows that display graphs and data tables. While you are working on one program, the windows that belong to other programs are hidden so that you do not get confused about which windows are associated with which analysis. Each workspace also has its own Work library for storing temporary data sets.

The SAS/IML Studio Application

Figure 1.4. The SAS/IML Studio Application

The workspace bar, shown at the bottom of Figure 1.4, contains a button for each workspace. You can click the workspace bar to navigate between workspaces.

SAS/IML Studio is a client application that runs on a Windows PC. It can connect to one or more SAS Workspace Servers. Figure 1.5 shows two possible architectures. The top image represents the scenario in which SAS Foundation and SAS/IML Studio are both installed on the same Windows PC. In this case, SAS/IML Studio connects to the local version of the SAS Workspace Server when it runs statements in the SAS/IML language or when it calls SAS procedures or DATA steps.

Client-Server Architecture for SAS/IML Studio

Figure 1.5. Client-Server Architecture for SAS/IML Studio

The bottom image indicates the scenario in which SAS Foundation is installed on a remote computer. In this case, SAS/IML Studio connects to the remote SAS Workspace Server. In fact, each workspace in SAS/IML Studio can connect to a different server. The remote computer can be running any operating systems that SAS software supports. You need to use the SAS Metadata Server Connection Wizard (found under the Tools menu on the SAS/IML Studio main menu) to choose the remote SAS servers to which SAS/IML Studio can connect.

The programs in this book run whether or not the SAS Foundation is installed on the same PC as the SAS/IML Studio application. However, many SAS administrators configure remote SAS servers so that the Sasuser library is read-only, whereas local SAS servers enable users to read and write to the Sasuser library. This affects the installation of the example data sets that are distributed with this book. See Section 1.8.

1.3.4 Installing and Invoking SAS/IML Studio

SAS/IML Studio requires SAS 9.2 and is distributed as part of the SAS/IML product. If SAS/IML Studio was not installed on your PC when the SAS System was installed, you can download and install SAS/IML Studio from support.sas.com/apps/demosdownloads/setupintro.jsp.

After the software is installed, you can invoke SAS/IML Studio by selecting Start1.3.4 Installing and Invoking SAS/IML Studio Programs 1.3.4 Installing and Invoking SAS/IML Studio SAS 1.3.4 Installing and Invoking SAS/IML Studio IML Studio 3.3.

You can create a new program window by clicking Create a New Program from the Welcome dialog box, or by selecting File1.3.4 Installing and Invoking SAS/IML Studio New1.3.4 Installing and Invoking SAS/IML Studio Workspace (or CTRL+N). When you create a new program window, you also create a new workspace.

1.3.5 Running a Program in SAS/IML Studio

The program in Section 1.3.2 also runs in SAS/IML Studio. The SAS/IML Studio application assumes that the program is an IMLPlus program, and therefore you do not need to use the PROC IML statement. In other words, you can run the following program in a SAS/IML Studio program window:

/* convert temps: no need for PROC IML statement in SAS/IML Studio  */
Celsius = {-40, 0, 20, 37, 100};   /* some interesting temperatures */
Fahrenheit = 9/5 * Celsius +32;    /* convert to Fahrenheit scale   */
Kelvin = Celsius + 273.15;         /* convert to Kelvin scale       */

In SAS/IML Studio, you run a program by choosing Program1.3.5 Running a Program in SAS/IML Studio Run from the main menu, as shown in Figure 1.6. The keyboard shortcut is to press F5. Equivalently, you can click the Run icon (1.3.5 Running a Program in SAS/IML Studio) beneath the main menu. The icon is circled in Figure 1.6. To run a portion of a program, you can highlight several statements in the program editor and press F5 or click the Run icon; only the highlighted statements will be executed.

Running a Program from the SAS/IML Studio Interface

Figure 1.6. Running a Program from the SAS/IML Studio Interface

This book uses the SAS/IML Studio syntax and omits the PROC IML statement. The first two chapters of this book describe basic syntax and features of the SAS/IML language. These statements are valid in both PROC IML and IMLPlus. Later chapters introduce graphs and other features that are not supported by PROC IML.

1.3.6 Using SAS/IML Studio for Exploratory Data Analysis

The SAS/IML Studio application provides menus and dialog boxes for many point-and-click analyses. By using the graphical user interface (GUI) you can graphically explore data by using dynamically linked statistical graphics. You can use a mouse pointer to select observations in a graph or data table. You can choose menus or click in dialog boxes to change properties of graphs, such as the placement of axis ticks or the title of a graph. You can change properties of a marker such as the shape and color.

You can also use the menu system to do the following:

  • compute descriptive statistics and model the distribution of univariate data

  • compute smoothers for bivariate scatter plots

  • fit various regression models, including linear regression, robust regression, logistic regression, and generalized linear regression

  • analyze multivariate data with principal component analysis, exploratory factor analysis, discriminant analysis, and correspondence analysis

All of these graphical and built-in analyses are available to you when you use SAS/IML Studio, and all of these GUI techniques are described in the SAS/IML Studio User's Guide.

Some of the exploratory techniques that are described in the SAS/IML Studio documentation are used in this book. They augment and enhance the programming techniques of this book. Often you will use a program to run an analysis and to create graphs, but then you will use interactive techniques to examine the results of the analysis. The regression diagnostics presented in Chapter 12, "Regression Diagnostics," are canonical examples of using interactive techniques to examine the fit of a statistical model.

If you are not familiar with the dynamically linked graphics in SAS/IML Studio, consider reading Chapters 8-11 of the SAS/IML Studio User's Guide.

1.4 Who Should Read This Book?

The goal of this book is to introduce SAS/IML to a wide range of statistical programmers. The examples in this book show how SAS/IML and IMLPlus programs enable you to analyze your data in new and innovative ways. In short, this book intends to show you how writing programs in SAS/IML software enables you to implement analytic techniques that would be difficult or impossible with other SAS software.

The audience for this book is analysts and statistical programmers who use SAS/STAT procedures and the DATA step to explore and model data. It is also intended for programmers who want an introduction to the SAS/IML and IMLPlus languages. The book does not presume prior knowledge of the SAS/IML language, but does presume some familiarity with DATA step concepts such as missing values, formats, and the length of a character variable. The book also presumes familiarity with basic statistical ideas such as you might encounter by using the UNIVARIATE, CORR, and GLM procedures. For example, this book discusses quantiles, distributions, density estimation, and regression. Only a few sections of the book presume familiarity with basic concepts of linear algebra such as matrix multiplication and solving a system of linear equations.

1.5 Overview of This Book

Depending upon your programming background, your knowledge of statistical programming, and your experience with SAS, you might be able to skip certain chapters of this book. The following list summarizes each chapter of the book:

  • Chapter 2 A basic introduction to the SAS/IML language. The chapter describes how to define matrices, compare quantities, and call functions and subroutines. It describes basic programming statements such as IF-THEN/ELSE and the iterative DO statement. If you are an experienced SAS/IML programmer, you can skip this chapter.

  • Chapter 3 An introduction to using the SAS/IML language in data analysis. Even experienced SAS/IML programmers should familiarize themselves with the technique presented in the section "Analyzing Observations by Categories" on page 68.

  • Chapter 4 How to call SAS procedures from a SAS/IML program.

  • Chapter 5 An overview of features in SAS/IML Studio that are not found in PROC IML.

  • Chapter 6-Chapter 7 These chapters introduce IMLPlus classes and describe how to create graphs.

  • Chapter 8 This chapter describes how to manage data in IMLPlus.

  • Chapter 9-Chapter 10 These chapters describe intermediate-level programming techniques for modifying graphs, including drawing on graphs and changing the color and shape of observation markers.

  • Chapter 11 How to call R functions and packages from a SAS/IML program.

  • Chapter 12-Chapter 14 Applications of SAS/IML programming to selected modern statistical topics such as regression diagnostics, simulation and sampling, and bootstrap methods.

  • Chapter 15 How to measure the time required for an algorithm to run on typical data, and also how to investigate how that time changes with the size or characteristics of the data.

  • Chapter 16 How to write programs that use interactive features of IMLPlus, such as creating dialog boxes and attaching menus to graphs.

The sections titled "Case Study" describe programs that implement some statistical analysis and are longer or more complicated than the simpler examples found elsewhere in the book.

1.6 Possible Roadmaps through This Book

Depending upon your experience and interests, there are several paths through this book. Some readers might identify themselves with one or more of the following personas:

  • SAS/STAT Programmer: This reader is familiar with the DATA step and with calling SAS/STAT procedures, but has no experience with writing SAS/IML programs. This reader is interested in using SAS/IML software in conjunction with SAS/STAT procedures to fit statistical models.

  • Novice SAS/IML Programmer: This reader has little or no experience writing SAS/IML programs, but wants to learn the basics of PROC IML and SAS/IML Studio.

  • Intermediate SAS/IML Programmer: This reader is familiar with the basics of the SAS/IML language and wants to improve his programming skills and learn about the features in SAS/IML Studio.

  • Advanced SAS/IML Programmer: This reader is proficient in SAS/IML programming and is interested in developing new algorithms and writing modules that implement new computational methods in SAS/IML software.

For these readers, the following table suggests possible paths through this book. A checkmark (1.6 Possible Roadmaps through This Book) indicates that the chapter should be read. Other chapters can be skimmed, since they presumably contain content that is already familiar. Chapters that can be skimmed are indicated in the table by a circled 'S' (1.6 Possible Roadmaps through This Book). Even if you skim a chapter, it is recommended that you read the chapter's programming tips and techniques.

Table 1.1. Possible Paths through This Book

Possible Paths through This Book

1.7 How to Read the Programs in This Book

In books about statistical programming, there are several approaches for presenting and describing a program. The approach used in this book is to briefly describe what a program is supposed to do, present the program in its entirety, and then discuss the details of the implementation. Comments in the program call attention to the main steps and to important statements that are explained later in the text. The advantage of this approach is that the reader can see the program in its entirety and read the description of the program after the statements are presented.

For example, the following simple SAS/IML program and subsequent explanation demonstrate how programs in this book are documented and described:

/* Present a simple example program with comments.                  */
/* The PROC IML statement is not required in SAS/IML Studio.        */
x = 3;        /* 1 */               /* NUMBERS indicate steps that  */
y = 2;                              /* are described in a list      */
z = x + y;    /* 2 */               /* AFTER the program.           */

print z;      /* display result */ /* Other statements are briefly  */
                                   /* described WITHIN the program. */

The program begins with a short comment that explains the purpose of the program. A number that appears in a comment statement means that the program statement is explained in an enumerated list that appears after the program list. For example, the previous program consists of the following main steps:

  1. Assign the value 3 to the x variable. Notice that the statement ends with a semicolon, as do all SAS statements. The variable y is assigned similarly.

  2. Define the variable z as the sum of the x and y variables. The result of this assignment is shown in Figure 1.7.

The Output from a Simple Program

Figure 1.7. The Output from a Simple Program

For simple or very short programs (such as the one above), the program is often described in paragraph form, rather than with an enumerated list.

1.8 Data and Programs Used in This Book

The data and programs used in this book are available from this book's companion Web site: http://support.sas.com/publishing/authors/wicklin.html. Click Example Code and Data to obtain the data and programs.

The example data sets used in this book are described in Appendix A, "Description of Data Sets."

1.8.1 Installing the Example Data on a Local SAS Server

When SAS Foundation and SAS/IML Studio are both installed on the same Windows PC, you can install the example data sets in the Sasuser library. The book's Web site includes instructions on how to download and install the data so that it is accessible from SAS/IML software.

All of the examples in this book are written for this configuration. After you install the data, you can run the examples without modification.

1.8.2 Installing the Example Data on a Remote SAS Server

Many SAS System administrators configure remote SAS servers so that the Sasuser library is read-only. In this case, you cannot install the example data sets in Sasuser.

If your copy of SAS/IML Studio is configured to connect to a SAS Workspace Server that runs on a remote computer, do the following:

  1. Ask your site administrator to use the SAS Management Console to set up a library named SPI in which you can store the book's data sets.

  2. When you download the data and programs from the book's Web site, install the data sets in the SPI library.

  3. When a program in the book refers to the Sasuser library, replace the library name with SPI.

For example, suppose that the book contains the following example that uses the Sasuser library:

/* read data installed on local SAS server */
proc iml;
use Sasuser.Vehicles;                  /* open data set for reading */
read all var _NUM_ into x;             /* read numerical data       */
close Sasuser.Vehicles;                /* close the data set        */

If you want to run the preceding example, you need to use the SPI library instead of the Sasuser library. This means that you will actually run the following program:

/* read data installed on remote SAS server */
proc iml;
use SPI.Vehicles;                      /* open data set for reading */
read all var _NUM_ into x;             /* read numerical data       */
close SPI.Vehicles;                    /* close the data set        */
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset