Chapter 9. Drawing on Graphs

Contents

  • 9.1 Drawing on a Graph 187

    • 9.1.1 Example: Overlaying a Regression Curve on a Scatter Plot 188

    • 9.1.2 Graph Coordinate Systems and Drawing Regions 191

    • 9.1.3 Drawing in the Foreground and Background 197

    • 9.1.4 Case Study: Adding a Prediction Band to a Scatter Plot 198

    • 9.1.5 Practical Differences between the Coordinate Systems 202

  • 9.2 Drawing Legends and Insets 203

    • 9.2.1 Drawing a Legend 204

    • 9.2.2 Drawing an Inset 206

  • 9.3 Adjusting Graph Margins 208

  • 9.4 A Module to Add Lines to a Graph 210

  • 9.5 Case Study: A Module to Draw a Rug Plot on a Graph 212

  • 9.6 Case Study: Plotting a Density Estimate 214

  • 9.7 Case Study: Plotting a Loess Curve 216

  • 9.8 Changing Tick Positions for a Date Axis 220

  • 9.9 Case Study: Drawing Arbitrary Figures and Diagrams 222

  • 9.10 A Comparison between Drawing in IMLPlus and PROC IML 224

9.1 Drawing on a Graph

The chapter describes how to overlay curves in the coordinate system of the data and also how to draw objects in different regions of a graph.

The statistical graphics in SAS/IML Studio display data stored in a data object. When you use a statistical analysis to model that data, it is often convenient to add a line, curve, or some other feature to the graph that helps you to visualize, interpret, and evaluate the model. For example, you might want to add a curve that shows predicted values for a scatter plot of two variables. Or, you might want to add a confidence ellipse to a scatter plot of bivariate normal data. Or, you might want to add a density estimate to a histogram, as shown in Figure 4.5.

Drawing lines, curves, or shapes on an IMLPlus graph is accomplished with the IMLPlus drawing subsystem. This is a collection of methods implemented in the Plot class, which is the base class for the IMLPlus graphics classes. The methods, which all begin with the prefix "Draw," are documented in the online Help, in the chapter titled "IMLPlus Class Reference." Appendix C summarizes methods that are used in this chapter.

9.1.1 Example: Overlaying a Regression Curve on a Scatter Plot

Suppose that in analyzing the Vehicles data set, you suspect that the fuel efficiency of a vehicle on the highway is related to the power of the engine, as measured by the engine displacement. The following statements create a scatter plot of the Mpg_Hwy variable versus the Engine_Liters variable:

/* create a scatter plot */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser.Vehicles");

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "Engine_Liters", "Mpg_Hwy");
Scatter Plot of Vehicle Characteristics

Figure 9.1. Scatter Plot of Vehicle Characteristics

The scatter plot is shown in Figure 9.1. Based on this graph, you decide to model the relationship between these variables with a quadratic polynomial. You continue the program with the following statements that call PROC GLM and produce the output that is shown in Figure 9.2:

/* call SAS procedure to model relationship between variables */
submit;
ods exclude ModelANOVA;
proc glm data=Sasuser.Vehicles;
   model Mpg_Hwy = Engine_Liters | Engine_Liters;
   output out=GLMOut P=Pred R=Resid;
quit;
endsubmit;
Output from PROC GLM for a Quadratic Regression Model

Figure 9.2. Output from PROC GLM for a Quadratic Regression Model

The GLM OUTPUT statement creates an output data set, GLMOut. The data set includes the Pred variable, which contains predicted values for the model. The following statements use the IMLPlus drawing subsystem to overlay these predicted values on the scatter plot shown in Figure 9.1. The scatter plot with the predicted values curve is shown in Figure 9.3.

/* read results and overlay curve on graph of data */
use GLMOut;
read all var {"Engine_Liters" "Pred"};                  /* 1 */
close GLMOut;

/* need to sort data by X variable before plotting */
m = Engine_Liters || Pred;                              /* 2 */
call sort(m, 1);                                        /* 3 */
p.DrawUseDataCoordinates();                             /* 4 */
p.DrawSetPenColor(BLUE);                                /* 5 */
p.DrawLine(m[,1], m[,2]);                               /* 6 */
Predicted Values Overlaid on a Scatter Plot

Figure 9.3. Predicted Values Overlaid on a Scatter Plot

The previous statements consist of six steps:

  1. Read the explanatory variable into the Engine_Liters vector and the predicted values into the Pred vector.

  2. Horizontally concatenate the Engine_Liters and Pred vectors to form an n × 2 matrix, m.

  3. Call the SAS/IML SORT subroutine to sort the rows of m by the values of the first column.

  4. Set the coordinate system for drawing. The DrawUseDataCoordinates method specifies that future drawing commands will draw in a coordinate system that is consistent with the graph's data and axes.

  5. Set a color for the graphical "pen" used for drawing subsequent lines. This step is optional. The default color is black.

  6. Draw a series of line segments that connect the ordered pairs that are contained in the rows of the m matrix. The line segments begin at the point defined by the first row of m, continue to the point defined by the second row of m, and so on until the last row of m.

Notice that the DrawLine method draws a line that connects ordered pairs. The arguments to the DrawLine method define the order in which the points are connected. If the SORT subroutine were not called in the third step, the line segments would connect the points in the order that they appear in the GLMOut data set.

Both Figure 9.3 and the GLM output in Figure 9.2 indicate that the quadratic model fits these data reasonably well. The careful analyst will also want to plot a scatter plot of the residuals versus the Engine_Liters, as described in Section 8.8.2.

9.1.2 Graph Coordinate Systems and Drawing Regions

In the previous section, the example program calls the DrawUseDataCoordinates method prior to drawing a curve on the screen. The method specifies that future drawing commands will use coordinates consistent with the graph's data and axes. This section discusses the various coordinate systems that you can use with IMLPlus graphics.

9.1.2.1 Drawing in the Coordinate System of the Data

Suppose you want to draw a rectangle on a scatter plot in order to draw attention to certain observations. To be completely concrete, suppose you want to draw a rectangle around the vehicles in Figure 9.1 that get less than 20 miles to the gallon on the highway in order to emphasize that these vehicles are not fuel efficient.

The method in the Plot class that draws a rectangle is called DrawRectangle. The arguments to the DrawRectangle method specify the location of the lower left and upper right corners of the rectangle. Specifically, if you call the method as DrawRectangle(x1, y1, x2, y2), then the graph displays a rectangle with lower left corner (x1, y1) and upper right corner (x2, y2). Consequently, you might hastily write the following statements:

/* draw a rectangle on a scatter plot */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser.Vehicles");

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "Engine_Liters", "Mpg_Hwy");
p.DrawRectangle(1, 12, 7.2, 20);       /* wrong coordinate system! */

The resulting scatter plot is shown in Figure 9.4. The graph does contain a rectangle, but notice that the rectangle does not contain the vehicles that get less than 20 miles per gallon on the highway. What happened?

An Incorrect Attempt to Draw a Rectangle on a Scatter Plot

Figure 9.4. An Incorrect Attempt to Draw a Rectangle on a Scatter Plot

What happened is that the DrawRectangle method drew the rectangle correctly, but drew it in a different coordinate system than the data uses. The default coordinate system for IMLPlus graphics is a coordinate system with (0,0) near the lower left corner of the plot area, and (100,100) near the upper right corner. In this coordinate system, the coordinates represent a percentage of the data range. (The plot area also contains margins that surround the data, as shown in Figure 9.15.) Consequently, the horizontal dimension of the rectangle in Figure 9.4 starts at 1% of the range for the Engine_Liters variable and ends at 7.2% of the range. Similarly, the vertical dimension starts at 12% of the range of the Mpg_Hwy variable and ends at 20% of the range.

To correctly draw the rectangle, you need to specify a coordinate system that corresponds to the data ranges of the variables in the graph. The way to do that is to call the DrawUseDataCoordinates method prior to calling the DrawRectangle method. This is shown in the following statements:

/* draw a rectangle on a scatter plot (correct version) */
p.DrawUseDataCoordinates();             /* specify data coordinates */
p.DrawRectangle(1, 12, 7.2, 20);        /* draw rectangle           */

The resulting rectangle is now drawn as intended. The scatter plot and rectangle are shown in Figure 9.5.

A Rectangle on a Scatter Plot

Figure 9.5. A Rectangle on a Scatter Plot

You can call the DrawUseDataCoordinates method on a histogram, a scatter plot, a contour plot, a line plot, and a polygon plot, provided that these plots are not displaying any variables that contain categorical data.

9.1.2.2 Drawing on a Graph That Displays a Categorical Variable

Some graphs plot categorical data; others plot continuous data. Some graphs can handle both types of data. For example, a bar chart plots categorical data, although the vertical axis (which displays frequency or percentage) is continuous. A scatter plot can handle either type of data, although it is most often used to plot two continuous variables. The horizontal axis for a box plot is categorical, whereas the vertical axis is continuous.

If your graph contains a categorical variable, then there is no "natural" coordinate system defined by the data. For example, Figure 7.13 shows a box plot for the US_Gross and MPAARating variables in the Movies data set. The horizontal axis for this plot does not have a numerical minimum or maximum value. There is no intrinsic coordinate value that represents the center of the PG category. Nevertheless, you might want to draw on this box plot or on other graphs with a categorical axis. In this case, you would want to define your own convenient coordinate system for drawing.

For example, suppose you want to model the dependence of US_Gross on MPAARating. You could modify Figure 7.13 so that it shows the mean revenue under the model for each MPAA rating category. One way to display that information is to plot a marker for each category, with the vertical position of the marker that represents the expected value of gross US revenues. Such a graph is shown in Figure 9.6.

Overlaying Markers on a Box Plot

Figure 9.6. Overlaying Markers on a Box Plot

How can you create such a graph? This section breaks down the creation of Figure 9.6 into the following steps:

  1. Create the box plot.

  2. Compute the mean of US_Gross accounted for by each MPAA rating category by using the technique that are described in Section 3.3.5.

  3. Set up a convenient coordinate system for drawing the markers.

  4. Compute the horizontal location of the center of each group and plot each marker at the appropriate horizontal and vertical position.

The following sections describe each step and the related program statements.

Create the box plot

The IMLPlus statements that create the box plot are shown below.

/* create a box plot for each MPAA rating */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser.Movies");
declare BoxPlot box;
box = BoxPlot.Create(dobj, "MPAARating", "US_Gross");

Compute the mean for each category

The second step is to compute the mean of the US_Gross for each MPAA rating category.

Section 3.3.5 describes how to write statements that compute quantities for each category. In this case, the quantity of interest is the mean of the US_Gross variable for each rating category. The following statements compute a vector GrossMean that contains the relevant values:

/* compute mean of revenue for each rating category */
use Sasuser.Movies;                /* or use dobj.GetVarData()      */
read all var {"MPAARating" "US_Gross"};
close Sasuser.Movies;

u = unique(MPAARating);            /* find the MPAA categories      */
NumGroups = ncol(u);               /* how many categories?          */
GrossMean = j(1, NumGroups);       /* allocate a vector for results */
do i = 1 to NumGroups;             /* for each group...             */
   idx = loc(MPAARating=u[i]);     /* find the movies in that group */
   m = US_Gross[idx];
   GrossMean[i] = m[:];            /* compute group mean            */
end;
print GrossMean[c=u];
Mean US Gross Revenues by MPAA Rating Category

Figure 9.7. Mean US Gross Revenues by MPAA Rating Category

The output from these statements is shown in Figure 9.7.

Define a coordinate system

The third step is to set up a convenient coordinate system.

The vertical axis represents a continuous quantity, so it is convenient to draw in the coordinates of the axis, which is the data range for the US_Gross variable. You can get the minimum and maximum values of an axis by calling the GetAxisViewRange method:

box.GetAxisViewRange(YAXIS, YMin, YMax);

The horizontal axis is categorical, so there is not a canonical coordinate system. Common choices for coordinates include [0, 1] or [—1,1] or even [0, NumGroups]. For definiteness, choose [0, 1] for this example. The following statement calls the DrawUseNormalizedCoordinates method to set the coordinate system to [0,1] × [YMin, YMax]:

box,DrawUseNormalizedCoordinates(0, 1, YMin, YMax);

Plot the mean markers

The fourth step is to compute the location of the markers and plot them.

The horizontal position of the markers should be at the center of the boxes. There are NumGroups boxes, and each has the same width. Because the horizontal coordinates were chosen to be in [0,1], the distance between each horizontal tick mark is 1/NumGroups. It follows that the center of each box is located at the values contained in the xcenters vector shown in Figure 9.8. The vectors are computed in the following statements:

/* compute coordinates of horizontal centers of boxes */
dX = 1/NumGroups;                  /* 1/5, in this example  */
BinEdges = (0:NumGroups-1)*dX;     /* 0, 1/5, ..., 4/5      */
XCenters = BinEdges + dX/2;        /* 1/10, 3/10, ..., 9/10 */
print BinEdges, XCenters;
Horizontal Coordinates for Each Category

Figure 9.8. Horizontal Coordinates for Each Category

Figure 9.8 shows the values of the edges and centers of the categories in the [0,1] coordinate system. Notice that the index creation operator (:) has lower precedence than arithmetic operators (+, -, *, /), so that 0:NumGroups-1 is equivalent to 0:4 for this example.

The program is almost complete. The horizontal coordinates of the markers are contained in the XCenters vector and the vertical coordinates are contained in the GrossMean vector, so you can draw the markers and add the title shown in Figure 9.6 with the following statements:

box.DrawMarker(XCenters, GrossMean, MARKER_CIRCLE, 8);
box.SetTitleText("Markers Indicate Mean US Revenue", true);

Notice that the DrawMarker method accepts vectors for the horizontal and vertical coordinates. Calling the method once is more efficient than writing a loop that plots a marker at XCenters[i] and GrossMean[i] for each appropriate value of i.

You can use the SetTitleText to add a title to a graph. The optional second argument (true) causes the SetTitleText method to immediately display the title.

9.1.3 Drawing in the Foreground and Background

The IMLPlus graphics enable you to draw in three distinct areas of a graph: the plot area foreground, the plot area background, and the graph area. Whenever possible, you should draw data-related curves and figures in the plot area in the data coordinate system, as shown in the previous sections.

You can change the drawing region by calling the DrawSetRegion method. The method takes a single argument; valid values are PLOTFOREGROUND, PLOTBACKGROUND, and GRAPH-FOREGROUND. The three regions are illustrated in Figure 9.9.

Three Drawing Regions

Figure 9.9. Three Drawing Regions

If you specify the PLOTFOREGROUND parameter, anything you draw on the graph appears in the plot area in front of the observations, whether they are represented by markers, bars, or boxes. For example, if you draw a rectangle or polygon in this area, it might obscure observations markers that are behind it. In contrast, the PLOTBACKGROUND parameter specifies that drawn objects appear behind observation markers, bars, and boxes. The plot foreground and plot background share a common coordinate system. For example, if you set the coordinate system for the plot foreground by using the DrawUseDataCoordinates method, and you then use DrawSetRegion to set the drawing area to the plot background, you do not need to call DrawUseDataCoordinates a second time.

If you specify the GRAPHFOREGROUND parameter as an argument to DrawSetRegion, objects can be drawn anywhere in the graph. When you draw in this region, the objects will obscure axes, axis labels, and anything in the plot area. Consequently, it is best to draw in the margins of the graph area. (Margins are described in Section 9.3.) The graph area supports only the normalized coordinate system.

In general, adhere to the following recommendations:

  • Draw data-related lines and curves in the PLOTFOREGROUND. For example, regression curves or density estimates.

  • Draw filled polygons in the PLOTBACKGROUND (for example, confidence bands, bivariate confidence ellipses, and regions under a probability curve).

  • Display text, notes, and legends in the GRAPHFOREGROUND or PLOTFOREGROUND regions. If necessary, you can make room for drawing by calling the SetGraphAreaMargins or SetPlotAreaMargins methods in the Plot class, as described in Section 9.3.

9.1.4 Case Study: Adding a Prediction Band to a Scatter Plot

To illustrate the different drawing regions, this section describes a program that extends the example in Section 9.1. The program creates a scatter plot of the Mpg_Hwy variable versus the Engine_Liters variable in the Vehicles data set. It then adds a quadratic regression curve. The new features of the program are to add a band in the plot area background that indicates a region of 95% confidence for individual predictions, and to create a simple legend in the graph area foreground.

The following statements re-create the plot and overlay the curve, as previously shown in Figure 9.3:

/* create a scatter plot; overlay curve */
submit;
ods exclude ModelANOVA;
proc glm data=Sasuser.Vehicles;
   model Mpg_Hwy = Engine_Liters | Engine_Liters;
   output out=GLMOut P=Pred lcl=Lower ucl=Upper;         /* 1 */
quit;

/* need to sort data by explanatory (X) variable before plotting */
proc sort data=GLMOut;                                   /* 2 */
   by Engine_Liters;
run;
endsubmit;

use GLMOut;
read all var {"Engine_Liters" "Mpg_Hwy"
              "Pred" "Lower" "Upper"};                   /* 3 */

close GLMOut;
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Work.GLMOut");

declare ScatterPlot p;                                   /* 4 */
p = ScatterPlot.Create(dobj, "Engine_Liters", "Mpg_Hwy");
p.DrawUseDataCoordinates();
p.DrawSetPenColor(BLUE);
p.DrawLine(Engine_Liters, Pred);

There are four differences between the statements that created Figure 9.3 and the present example:

  1. The GLM OUTPUT statement creates three variables: Pred, Lower, and Upper. The Lower variable contains the lower bound of a 95% confidence interval for an individual prediction. The Upper variable contains the upper bound.

  2. The example in the section "Drawing on a Graph" on page 187 calls the SAS/IML SORT subroutine to sort the variables so that the data can be plotted with the DrawLine method. This example calls the SORT procedure instead.

  3. This example reads variables from the sorted GLMOut data set. In particular, the Engine_Liters and Mpg_Hwy vectors are sorted.

  4. The scatter plot is created from the sorted data.

In spite of these differences, the present example creates the same graph and overlays the same curve as shown in Figure 9.3.

The following statements add a light-gray band to the background of the plot area. The drawing is accomplished with the DrawPolygon method:

/* draw 95% prediction band */
p.DrawSetRegion(PLOTBACKGROUND);                        /* 5 */
p.DrawSetBrushColor(200, 200, 200);                     /* 6 */
p.DrawSetPenStyle(OFF);                                 /* 7 */

/* create a polygon defined by the upper/lower prediction limits */
n = nrow(Engine_Liters);
x = Engine_Liters // Engine_Liters[n:1];                /* 8 */
y = Lower // Upper[n:1];
p.DrawPolygon(x, y, true);                              /* 9 */
  1. The default drawing location is the plot area foreground. However, a filled prediction band in the foreground obscures observations, so set the drawing region to the background of the plot area. The foreground and background areas share the same coordinate system, so you do not have to repeat the call to DrawUseDataCoordinates.

  2. Set the color of the graphical "brush." This is the color that is used to fill the prediction band. (The default color is black.) You can specify either a keyword that specifies a predefined color such as GRAY or YELLOW, or you can specify the color by using an RGB specification, as is done for this example. The RGB triple (200, 200, 200) is a light gray color. Chapter 10 describes how to specify colors.

  3. When you draw a polygon, the current pen is used to draw the outline of the polygon. This example turns off the pen, so that there is no outline drawn on the prediction band.

  4. How can you draw a confidence band if you know the curves that make up its upper and lower boundary? The program shows one approach. The DrawPolygon method requires two arguments: a vector of X values and a vector of Y values. The polygon is drawn by connecting the pairs of points in the order they appear in the vectors. Let p1, p2,...,pn name the ordered points on the lower curve and let q1, q2,..., qn name the ordered points on the upper curve, as shown in Figure 9.10. You can draw the prediction band by specifying the values of the lower boundary (in increasing X order), followed by the values of the upper boundary in reverse X order.

  5. The DrawPolygon method draws the polygon. The polygon connects the points p1, p2, ..., pn, qn, qn-1,..., q1, as shown in Figure 9.10.

    Creating a Polygon from Upper and Lower Boundaries

    Figure 9.10. Creating a Polygon from Upper and Lower Boundaries

The prediction band is drawn in the background of the plot area, as shown in Figure 9.11.

So far, the example in this section has drawn in the plot area foreground and background. For the sake of completeness, the last few statements of this section show how you can draw in the graph area. The graph area is a good place to add a legend, so the following statements draw the text "95% Prediction Band" in the lower right corner of the graph area, and also displays a light-gray rectangle:

/* draw simple legend */
p.DrawSetPenAttributes(BLACK, SOLID, 1);               /* 10 */
p.DrawSetRegion(GRAPHFOREGROUND);                      /* 11 */
p.DrawSetTextAlignment(ALIGN_LEFT, ALIGN_CENTER);      /* 12 */
p.DrawText(80, 5, "95% Prediction Band");              /* 13 */
p.DrawRectangle(75, 3, 79, 7, true);                   /* 14 */
  1. Recall that the graphical pen is currently turned off. Reset the pen to draw solid black lines one pixel wide.

  2. Set the drawing region to be the graph area. Recall that the default coordinate system is [0, 100] × [0,100].

  3. By default, text is centered at a specified location. For this example, left-justify the text by calling the DrawSetTextAlignment method.

  4. If a graph is using default margins, the left and right edges of the plot area are 15 and 96, respectively; the coordinates of the bottom and top edges of the plot area have vertical coordinates 20 and 90. Drawing text in the margin sometimes requires trial and error to place the text correctly. For this example, draw the text at the location (80, 5), which corresponds to the lower right corner of the graph area.

  5. Draw a filled rectangle preceding the text. The color of the graphical brush is still light gray. The outline of the rectangle is drawn with the characteristics of the current graphical pen.

The completed graph is shown in Figure 9.11. The graph shows a quadratic fit drawn in the plot area foreground. The curve is drawn in the data coordinate system. The graph shows a prediction band drawn in the plot area background, which shares the same coordinate system as the plot area foreground. The graph also shows a primitive legend that is drawn in the graph area. The legend is drawn in the default (normalized) coordinate system.

A Graph with Features Drawn in Three Drawing Regions

Figure 9.11. A Graph with Features Drawn in Three Drawing Regions

You do not need to draw legends by using low-level drawing commands: SAS/IML Studio distributes several modules that make it easy to create legends. The DrawInset and DrawLegend modules are discussed in Section 9.2 and are documented in the online Help. You can view the documentation in the chapter titled "IMLPlus Module Reference," in the section titled "Graphics."

9.1.5 Practical Differences between the Coordinate Systems

There are some important differences between the data coordinate system and a user-defined coordinate system. Objects drawn in the data coordinate system respond to zooming and panning in the plot area, whereas a user-defined coordinate system does not. (Try panning the box plot in Figure 9.6 and see what happens! To pan the plot area, right-click in the plot area and select Pan Tool from the pop-up menu.)

Consequently, the data coordinates are best for plotting regression lines, reference lines, confidence regions, and similar objects. For the same reason, a user-defined coordinate system is best for drawing legends, insets, and similar annotations because you typically do not want a legend to disappear when you zoom or pan in a graph. Unfortunately, there is no way (as of SAS/IML Studio 3.3) to draw in the coordinate system of the data for a graph that displays a categorical variable.

You should also be aware of the effect of resizing a window on objects drawn in user-defined coordinates. If you resize the window in Figure 9.11, you will observe that the simple legend does not rescale like the rest of the window. This is shown in Figure 9.12.

Resizing a Window That Has Text in the Graph Area

Figure 9.12. Resizing a Window That Has Text in the Graph Area

The size of the window has shrunk, but the left side of the text is still drawn at a position 80% along the width of the graph. Since the size of the window has changed, the entire text is no longer visible. The same behavior occurs if you change the graph margins. The text is not aware of the graph area margins, it is merely displayed at a certain percentage of the window's width.

The lesson to learn is this: when you draw objects such as legends and insets that are in the GRAPH-FOREGROUND region, they do not rescale as the window resizes. Therefore, you should strive to set the size of a window prior to drawing in a user-coordinate system.

9.2 Drawing Legends and Insets

As mentioned previously, SAS/IML Studio distributes several modules that make it easy to create legends. This section provides an overview of the DrawLegend and DrawInset modules. For more information, see the SAS/IML Studio online Help.

Both of these modules use the IMLPlus drawing subsystem to place text on a graph in a user-defined coordinate system. Consequently, they are affected by the issues discussed in Section 9.1.5.

9.2.1 Drawing a Legend

You can use the DrawLegend module to associate markers or lines in a graph with descriptive text. For example, the following statements extend the ideas in Section 8.7 to create a scatter plot in which hybrid-electric vehicles are represented by markers of one shape and color, whereas traditional gasoline engines are represented by another shape and color:

/* set markers shape and color based on whether hybrid-electric */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser", "vehicles");

/* traditional = black circle; hybrid = blue star */
dobj.GetVarData("Hybrid", h);
dobj,SetMarkerShape(loc(h=0), MARKER_CIRCLE);
dobj.SetMarkerShape(loc(h=1), MARKER_STAR);
dobj.SetMarkerColor(loc(h=1), BLUE);

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "Mpg_City", "Mpg_Hwy");
p.SetMarkerSize(6);

You can use the Plot.DrawSetPenAttributes and Plot.DrawLine methods to add regression lines to the graph, as shown in Section 9.4. The final graph is shown in Figure 9.13. Although the person that created this graph would have no problem understanding it, the graph needs a legend if it is going to be understood by others who do not know which observations represent hybrid-electric vehicles.

A Scatter Plot with Multiple Marker Shapes and Line Styles

Figure 9.13. A Scatter Plot with Multiple Marker Shapes and Line Styles

The DrawLegend module has several arguments. The syntax is:

  • DrawLegend(Plot plot, Labels, Size, Color, Style, Symbol, BGColor, Loc);

The first argument specifies the Plot object on which to draw the legend. The Labels argument specifies the text entries in the legend, and the Size argument determines the size of the text. The next three arguments determine the line color, linestyle, and symbol shape of graphical elements that are associated with each text item. The BGColor argument determines the fill-color of the rectangle that defines the legend area, or you can specify -1 to draw only the boundary of the rectangle.

The last argument specifies where the legend should be placed in the graph. The argument is a three-character string. Each character specifies a location according to the following rules:

  • First Character As described in Section 9.1.3, a graph has two main areas: the plot area and the graph area. The valid options are:

    • I specifies that the legend is drawn inside the plot area.

    • O specifies that the legend is drawn outside the plot area. That is, in the graph area.

  • Second Character The second character specifies the horizontal placement of the legend. The valid options are:

    • L specifies that the legend is drawn near the left side of the plot or graph area.

    • C specifies that the legend is drawn in the center of the plot or graph area.

    • R specifies that the legend is drawn near the right side of the plot or graph area.

  • Third Character The third character specifies the vertical placement of the legend. The valid options are:

    • T specifies that the legend is drawn near the top of the plot or graph area.

    • C specifies that the legend is drawn in the center of the plot or graph area.

    • B specifies that the legend is drawn near the bottom of the plot or graph area.

The following statements call the DrawLegend module to create a legend for Figure 9.13. The legend is shown in the lower right of Figure 9.14.

/* define arguments for DrawLegend module */
Labels = {"Traditional" "Hybrid-Electric"};
LabelSize =12;                  /* size of font                  */
LineColor = BLACK || BLUE;
LineStyle = SOLID || DASHED;
Symbol    = MARKER_CIRCLE || MARKER_STAR;
BGColor = -1;               /* -1 means "transparent background" */
Location = "IRB";           /* Inside, Right, Bottom             */

run DrawLegend(p, Labels, LabelSize, LineColor, LineStyle,
               Symbol, BGColor, Location);

9.2.2 Drawing an Inset

An inset is often used to display descriptive statistics: the name of a statistic appears in the left column and the value in the right column. Common statistics suitable for an inset include the number of observations, a correlation coefficient, parameter estimates, and so on.

You can use the DrawInset module to draw a rectangle that contains two columns of text. The syntax for the DrawInset module is similar to the syntax for the DrawLegend module. The syntax is:

  • Drawinset(Plot plot, Labels, Values, Properties, Typeface, BGColor, Loc);

The first argument to the module is the Plot object on which to draw the inset. The second and third arguments to the DrawInset module specify the labels and the values that appear in each column of the inset. The values can be numeric (in which case, an NLBESTw. format is applied to the values), or you can specify character strings.

The Properties argument is a vector of properties that describe the typeface used in the inset. For this argument, a single missing value means "use default values." If you want to override some default value, allocate a vector of five missing values and override the element that corresponds to the option that you want to change. For example, the following statements create the LabelProps vector as a vector of five missing values, but then override the default font size by assigning a value to the first element:

LabelProps = j(5, 1, .);   /* use default values for labels       */
LabelProps[1] =12;         /* override size of font               */

The Typeface argument specifies the name of a typeface such as "Arial." You can specify the default typeface by specifying an empty matrix or a zero-length string for this argument, as shown in the following statements. The remaining arguments for the DrawInset module are identical to the final arguments for the DrawLegend module.

/* define arguments for DrawInset module */
Labels = {"Slope: Traditional" "Slope: Hybrid-Electric"};
Values = {1.2 0.74};       /* values associated with labels         */
LabelProps = j(5, 1,       /* use default values for labels         */
LabelProps[1] = 12;        /* override size of font                 */

Typeface = "";             /* empty string means "default typeface" */
BGColor = -1;              /* -1 means "transparent background"     */
Location = "ILT";          /* Inside, Left, Top                     */
run DrawInset(p, Labels, Values, LabelProps,
              LabelTypeface, BGColor, Location);

The inset is shown in the upper left of Figure 9.14.

A Graph with a Legend and Inset

Figure 9.14. A Graph with a Legend and Inset

The DrawLegend and DrawInset modules share a common feature: you can pass in missing values or empty matrices in order to obtain default behavior. Some object-oriented programming languages such as Java or C++ enable the programmer to provide default values for unspecified arguments to a function. Unfortunately, the SAS/IML language does not enable you to define a module with optional arguments. However, when you write a module you can—and should—look at the type and value of an argument and use a default value if the argument is an empty matrix or a missing value.

For example, if arg is the name of a module argument, you can use the following statements to determine whether to use a default value for the argument:

/* determine whether to use default value for a module argument */
DefaultArgValue = 12345;       /* define default value for argument */
if type(arg)='U' then          /* arg is empty matrix, use default  */
   argValue = DefaultArgValue;
else if arg=. then             /* arg is missing value, use default */
   argValue = DefaultArgValue;
else
   argValue = arg;             /* use value passed into module      */

9.3 Adjusting Graph Margins

As discussed previously, there are two main areas in an IMLPlus graph. The plot area displays the data; the graph area displays tick marks, axis labels, and titles. In Figure 9.15, the plot area is the white region and the graph area is the gray area. If you draw a legend that is positioned outside of the plot area, the DrawLegend module automatically increases the graph area margins in order to fit the legend. If you are drawing your own annotations outside of the plot area, you might need to increase the graph margins to make room for the annotations. You can also decrease the graph area margins to suit your preferences. This section describes how to set margins in a graph.

Both the plot area and the graph area contain margins. Plot area margins add space around the data display so that observations are separated from the edge of the plot area. Margins in the graph area add space outside of the plot area so that there is room for axes and titles. Figure 9.15 shows the default margins for a scatter plot. The margins are specified as a proportion of the height or width of the graph. For example, in Figure 9.15 the left margin of the graph area occupies 15% of the width of the graph and the top margin of the plot area occupies 5% of the height of the plot area.

Margins in the Plot Area and Graph Area

Figure 9.15. Margins in the Plot Area and Graph Area

You can increase or decrease the graph margins by using the SetGraphAreaMargins method in the Plot class. The method takes four arguments: the percentage of the graph that the margin should occupy on the left, right, top, and bottom of the graph, respectively. You can pass in the value -1 for an argument to indicate that the method should not change the corresponding margin. You can use the GetGraphAreaMargins method to retrieve the current settings for the margins. For example, the following statements create a scatter plot, print the default margins, and change three of the margins:

/* get and set margins in the graph area */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser", "vehicles");

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "Mpg_City", "Mpg_Hwy");

p.GetGraphAreaMargins(left, right, top, bottom);
print left right top bottom;
p.SetGraphAreaMargins(0.1, 0.15, -1, 0.12);
Default Margin Sizes, as a Fraction of the Graph Area

Figure 9.16. Default Margin Sizes, as a Fraction of the Graph Area

The scatter plot that the program creates is shown in Figure 9.17. In a similar way, you can use the SetPlotAreaMargins to adjust the margins in the plot area.

A Plot with New Margins

Figure 9.17. A Plot with New Margins

9.4 A Module to Add Lines to a Graph

In creating statistical graphs, it is often useful to overlay reference lines. The simplest reference line is a horizontal line. For example, you might want to add a reference line at zero on a residual plot in order to separate positive residuals from negative. It is often useful to display the identity line. In regression diagnostics (see Chapter 12), it is useful to display lines that have statistical significance, such as lines that indicate large residuals or high-leverage values for observations.

Reference lines are used so frequently that it is worthwhile to construct a SAS/IML subroutine to draw a reference line. A convenient syntax for a module that overlays the line defined by the equation y = a + bx might be as follows:

run abline(plot, a, b, attrib);

where the arguments are as follows:

plot

is an object of the Plot2D class such as a ScatterPlot, LinePlot, or Histogram. A line will be added to this graph.

a

specifies the intercept of the line. If this argument is a k x 1 vector, then k lines are drawn.

b

specifies the slope of the line. To specify a vertical line, you can adopt the convention that the slope is a missing value and that the a parameter specifies the x-intercept. If this argument is a k × 1 vector, then k lines are drawn.

attrib

specifies the color, style, and width of the line. You can adopt the convention that if this argument is a missing value, default line attributes are used. If this argument is a scalar integer, assume that it specifies a color parameter. If this argument contains four elements, assume that the last element is either PLOT-BACKGROUND or PLOTFOREGROUND.

The following statements define a module that is used in subsequent section to add reference lines to regression diagnostic plots:

/* Module to draw a vertical, horizontal, and diagonal lines
 * on a scatter plot, line plot, or histogram.
 * The form of the line is y = a + b*x.
 * INPUT p: a Plot2D object: ScatterPlot, LinePlot, or Histogram
 *       a: specifies the intercept for the line
 *       b: specifies the slope for the line
 *  attrib: specifies line attributes.
 *          If attrib = ., use default attributes
 *          If attrib = 0xRRGGBB, it specifies a color
 *          if ncol(attrib)=3, attrib = {color,style,width}
 *          if ncol(attrib)=4, attrib = {color,style,width,PlotRegion}
 *
 * For a vertical line, a= x-intercept and b=. (MISSING).
 * To specify multiple lines, the parameters a and b can be column vectors.
 */
start abline(Plot2D p, a, b, attrib);
   if nrow(a)^=nrow(b) then
      Runtime.Error("abline: incompatible intercepts and slopes");

   p.GetAxisViewRange(XAXIS, xMin, xMax);     /* get range of x     */
   x0 = xMin - (xMax-xMin);
   xf = xMax + (xMax-xMin);
   p.GetAxisViewRange(YAXIS, yMin, yMax);     /* get range of y     */
   y0 = yMin - (yMax-yMin);
   yf = yMax + (yMax-yMin);

   p.DrawUseDataCoordinates();
   if attrib=. then
      p.DrawSetPenAttributes(BLUE, DASHED, 1);/* default attributes */
   else if ncol(attrib)=1 then
      p.DrawSetPenColor(attrib);              /* set color          */
   else if ncol(attrib)>=3 then
      p.DrawSetPenAttributes(attrib[1], attrib[2], attrib[3]);
   if ncol(attrib)=4 then
      p.DrawSetRegion(attrib[4]);

   do i = 1 to nrow(a);
      if b[i]=. then                          /* vertical line      */
         p.DrawLine(a[i], y0, a[i], yf);
      else                                    /* horiz or diag line */
         p.DrawLine(x0, a[i]+b[i]*x0, xf, a[i]+b[i]*xf);
   end;
finish;
store module=abline;

The details of the module are straighforward. Notice that the default line is a thin, blue, dashed line. Those are the line attributes that are used if you pass in a missing value for the attrib argument.

As an example, consider Figure 9.13 which features two lines. One line is a solid black line; the other is blue and dashed. You can draw those lines on the scatter plot, p, by using the following statements:

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "Mpg_City", "Mpg_Hwy");
p.SetMarkerSize(6);

/* draw regression lines for traditional and hybrid-electric */
attrib = BLACK || SOLID || 1;          /* traditional = solid black */
run abline(p, 2.71, 1.20, attrib);     /* y = 2.71 + 1.20*x         */

attrib = BLUE || DASHED || 1;          /* hybrid = dashed blue      */
run abline(p, 7.36, 0.74, attrib);     /* y = 7.36 + 0.74*x         */

9.5 Case Study: A Module to Draw a Rug Plot on a Graph

Section 6.11 shows how you can define a module that accepts an object of the Plot class. Because the Plot class is the base class for multiple graph types, you can call the module on a wide variety of graphs: scatter plots, bar charts, and so on.

It is also possible to write a module that examines the type of the incoming Plot object and calls methods based on whether the object is a member of a specific derived class. For example, suppose you want to draw little tick marks that indicate the distribution of the X variable at the bottom of histograms and scatter plots, as shown in Figure 9.18. (This is sometimes called a "rug plot.") This sounds relatively simple, but there are some implementation issues to consider:

  • How can the module determine if a graph is an object of the Histogram or ScatterPlot class? The answer is that the IsInstance method in the DataView class returns true if the argument to the method is an object of a specified class.

  • What should the module do if a variable in the scatter plot is a nominal variable? In this case, the following module returns without drawing the plot.

  • The scatter plot has, by default, a nonzero margin at the bottom of the area, whereas the histogram does not. How does that affect the placement of the tick marks for the rug plot? The following module overlays tick marks on the histogram bars, but for a scatter plot it draws the tick marks in the plot area margin.

The following statements implement the RugPlot module, which adds a rug plot to a histogram or to a scatter plot:

/* argument of module can be scatter plot or histogram */
start RugPlot(Plot2D p);
   /* draw a "rug plot" for the X variable */
   if !Histogram.IsInstance(p) &
      !ScatterPlot.IsInstance(p) then                   /* 1 */
      return;

declare DataObject dobj;
dobj = p.GetDataObject();                               /* 2 */
p.GetVars(ROLE_X, xVarName);                            /* 3 */
dobj.GetVarData(xVarName, x);                           /* 4 */

/* for scatter plot, check for nominal variables */
if ScatterPlot.IsInstance(p) then do;
   p.GetVars(ROLE_Y, yVarName);
   if dobj.IsNominal(xVarName) |
      dobj.IsNominal(yVarName) then                     /* 5 */
      return;
end;

p.GetAxisViewRange(YAXIS, yMin, yMax);                  /* 6 */
   y0 = j(nrow(x), 1, yMin);
   if Histogram.IsInstance(p) then
      y1 = y0 + 0.05*(yMax-yMin);                       /* 7 */
   else
      y1 = y0 - 0.05*(yMax-yMin);
   p.DrawUseDataCoordinates();
   p.DrawLine(x, y0, x, y1);                            /* 8 */
finish;

The main steps in the module are described in the following list:

  1. Use the IsInstance method in the DataView class to determine whether the argument to the module is a scatter plot or a histogram. If it is neither, the module silently returns. (Alternatively, you could use the Runtime.Warning method to display a warning message.)

  2. The module requires getting data values for the X variable, so use the GetDataObject method in the DataView class to obtain the data object that is associated with the graph.

  3. Use the GetVars method in the Plot class to obtain the name of the X variable for the graph.

  4. Get the data for the X variable. The x vector contains the horizontal positions for the tick marks in the rug plot.

  5. If the graph is a scatter plot and either variable is nominal, then the module returns without drawing the rug plot. This is done because the subsequent call to the DrawUse-DataCoordinates method fails unless both axes are interval variables. (Alternatively, you could set up a user-defined coordinate system; see the documentation for the Plot.DrawUseNormalizedCoordinates method.)

  6. Get the range of the Y axis. For a scatter plot, the minimum and maximum values of the axis are the minimum and maximum values of the Y data. For a histogram, the minimum value is zero and the maximum value is the height of the tallest bar.

  7. Drawing the rug plot requires drawing many short lines that all begin at the same vertical position and all have the same height. The vector y0 is a vector of starting vertical positions; it consists of repeated values of yMin. The vector y1 is a vector of ending vertical positions. It differs from y0 by an amount equal to 5% of the height of the vertical axis. For a histogram, each tick begins at yMin and is drawn toward the center of the histogram. For a scatter plot, each tick is drawn away from the plot center (into the plot area margin).

  8. Draw the entire rug plot with a single call to the DrawLine method. The i th line is drawn from the point (x[i],y0[i]) to the point (x[i],y1[i]). Notice that the module does not loop over all observations and call the DrawLine method once for each tick mark.

The module could be modified to support drawing a rug plot for the Y variable of a scatter plot and even a box plot, but that modification is not shown here. The following statements call the module on a scatter plot and a histogram.

/* draw rug plot on scatter plot and histogram */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser.Movies");

declare ScatterPlot scat;
scat = ScatterPlot.Create(dobj, "ReleaseDate", "Budget");
run RugPlot(scat);

declare Histogram hist;
hist = Histogram.Create(dobj, "Budget");
run RugPlot(hist);

The results are shown in Figure 9.18. The rug plot on the histogram clearly shows the nonuniform distribution of the Budget variable. The rug plot on the scatter plot shows a fairly uniform distribution of dates because movies are typically released on Fridays. (However, notice that the rug plot does not indicate how many movies are released on a given day because of overplotting.) The dark bands in the scatter plot are caused by the movies that are released on non-Fridays.

Rug Plots in a Histogram and a Scatter Plot

Figure 9.18. Rug Plots in a Histogram and a Scatter Plot

9.6 Case Study: Plotting a Density Estimate

Chapter 4, "Calling SAS Procedures," describes how to define an IMLPlus module, ComputeKDE, that computes a kernel density estimate (KDE) for univariate data. The module is used to create Figure 4.5, which shows a KDE overlaid on a histogram of the Engine_Liters variable in the Vehicles data set, but no program statements were given.

This section uses the ComputeKDE module to reproduce Figure 9.19. The module either must be defined in the same program window or must be stored in a directory on the IMLPlus module search path, as described in Section 5.7.

Recall that the following statements call the ComputeKDE module:

DSName = "Sasuser.Vehicles";
VarName = "Engine_Liters";
Bandwidth = "MISE";
run ComputeKDE(x, f, DSName, VarName, Bandwidth);

After the module runs, the vector x contains evenly spaced values between the minimum and maximum values of Engine_Liters. The vector f contains corresponding values for the density of Engine_Liters. The following statements overlay this information on a histogram of Engine_Liters:

/* overlay density estimate on histogram */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet(DSName);     /* 1 */

declare Histogram hist;
hist = Histogram.Create(dobj, VarName);                /* 2 */

hist.ShowDensity(); /* display density instead of frequency */
hist.DrawUseDataCoordinates();
hist.DrawLine(x, f);                                   /* 3 */

The program consists of the following steps:

  1. Read the data set. Notice that the string in the DSName vector is passed to the DataObject method.

  2. Create the histogram. Notice that the variable name is not hard-coded, but is set to the value that is contained in VarName.

  3. The ShowDensity method is called so that the vertical axis of the histogram is on the same scale as the KDE. Then the KDE is overlaid on the histogram.

The histogram is shown in Figure 9.19. Notice that this program is "reusable" in the following sense: you can change the value of DSName and VarName and the program creates a histogram and KDE of the new data.

Histogram with Kernel Density Estimate

Figure 9.19. Histogram with Kernel Density Estimate

The KDE shown in Figure 9.19 indicates that the density of the Engine_Liters variable is multimodal. Most of the engine displacements are in the range 2.4-3.6 liters, with another group of vehicles having displacements near 5.3 liters. The shape of the KDE suggests that there are at least two distinct groups that comprise the 2.4-3.6 range, since there are two peaks in that range. It is possible that the density for the Engine_Liters variable is a mixture density: the sum of a set of component densities, with the major components being the densities from the four-, six-, and eight-cylinder vehicles.

9.7 Case Study: Plotting a Loess Curve

The techniques presented in this chapter enable you to create dynamically linked statistical graphs, to call methods that change attributes of the graphs, to overlay curves or regions on the graph, and to draw text and figures in the plot area or in the margins of the graph area. This section combines those techniques to overlay a loess curve on a scatter plot.

The right side of Figure 9.18 shows a scatter plot of Budget versus ReleaseDate for the Movies data set. The graph suggests that movies released in the summer months and in the weeks prior to Christmas day have larger budgets than those released at other times. One way to model the relationship between these two variables is with a nonparametric smoother.

This section calls the LOESS procedure on these data to determine how the mean budget of a movie is related to its release date. A loess model estimates the mean budget for a given release date, t0, by local estimation: only the budgets for the k movies released closest to to are used to predict the budget at t0. The parameter k can be set explicitly, or the LOESS procedure can choose a value for k that optimizes a criterion such as generalized cross validation (GCV) or the corrected Akaike's information criterion (AICC).

The following statements create the scatter plot:

/* create a scatter plot */
declare DataObject dobj;
dobj = DataObject.CreateFromServerDataSet("Sasuser.Movies");

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "ReleaseDate", "Budget");

The goal of this section is to add a loess fit to this scatter plot. You can call the LOESS procedure in a SUBMIT block. The LOESS procedure does not have an OUTPUT statement; the output data set is created by using a SCORE statement in conjunction with ODS OUTPUT to write a dataset that contains the predicted values, as shown in the following statements:

/* compute loess curve for data */
submit;
proc loess data=Sasuser.Movies;
   model Budget = ReleaseDate / select=AICC(presearch); /* 1 */
   score;                                               /* 2 */
   ods output ScoreResults=LoessOut;                    /* 3 */
   ods select SmoothingCriterion FitSummary;
run;

proc sort data=LoessOut;                                /* 4 */
   by ReleaseDate;
run;
endsubmit;
Output from the LOESS Procedure

Figure 9.20. Output from the LOESS Procedure

The SUBMIT block consists of the following steps:

  1. Call the LOESS procedure to create a model of Budget by ReleaseDate. The SELECT= option is used to select a value of the smoothing parameter that minimizes the AICC statistic. The PRESEARCH suboption helps to find a global minimum of the AICC statistic. This suboption improves the likelihood that the parameter value found by the LOESS procedure is a global minimum of the AICC.

  2. The SCORE statement specifies that the loess model be evaluated at the values of the explanatory variables in the input data set.

  3. The LOESS procedure does not support an OUTPUT statement, but you can use the ODS OUTPUT statement to create a data set, LoessOut, that contains the predicted values of the loess model at each observation in the input data set. The predicted values are contained in the variable P_Budget. (In general, the LOESS procedure names the variable that contains predicted values P_YVar, where YVar is the name of the response variable.)

  4. The output data are sorted according to the values of ReleaseDate in preparation for plotting the predicted values on the scatter plot.

The output from the LOESS procedure is shown in Figure 9.20. The output shows that the AICC statistic is optimized by a smoothing parameter of 0.1518. This implies that approximately 15% of the 359 observations, or 54 points, are used in each local fit.

After the output data are created and sorted, the predicted values are read into SAS/IML vectors and plotted on the scatter plot by using the following statements:

/* overlay loess curve on scatter plot */
use LoessOut;
read all var {"ReleaseDate" "P_Budget"};
close LoessOut;
p.DrawUseDataCoordinates();
p.DrawSetPenColor(BLUE);
p.DrawLine(ReleaseDate, P_Budget);

These statements are discussed in previous sections. The result is shown in Figure 9.21.

A Loess Curve

Figure 9.21. A Loess Curve

As expected, the smooth curve does exhibit some undulations. The curve tends to be highest in the summer months. It also exhibits peaks in December of 2005 and 2006, but it does not seem to peak in December of 2007.

The data exhibits an interesting feature in the spring of 2007: the smoothed curve does not decrease as much in spring 2007 as it did in spring of 2006. There are two reasons for this. First and most importantly, there were two movies released in the spring of 2007 that had relatively large budgets: one released 16FEB07 had a budget of 120 million dollars and another released 02MAR07 had a budget of 85 million dollars. Secondly, recall that this loess fit uses 54 points in each local neighborhood. Because there were relatively few movies released in the spring of 2007, the local neighborhood for, say, 02MAR07 includes movies more than 100 days before and after that date. In particular, the big-budget movies of December 2006 and May 2007 are included in the local neighborhood for 02MAR07. In contrast, the local neighborhood for 02MAR06 includes movies at most 70 days before and after that date.

9.8 Changing Tick Positions for a Date Axis

This section describes how to modify the placement of ticks on a graph that displays a date variable.

Although Figure 9.21 adequately displays the data and the loess fit of the data, the tick marks for the horizontal axis are less than satisfactory. It is difficult to discern the location of Christmas day and of the summer months—dates that are important for the analysis of the data. This section describes how you can modify the placement of tick marks on an axis by using the SetAxisTicks method.

The IMLPlus graphs display numeric axes with evenly spaced tick marks. The two parameters that are used to control the placement of numeric tick marks are the anchor position, x0, and the tick unit, Δx. (The IMLPlus methods that set these parameters are SetAxisTickAnchor and SetAxisTickUnit.) The tick marks are placed at the positions xo ± kΔx for k = 0,1,2,....This approach does not work well for data that represent dates because the months of the year each have a different number of days, so there is no value for Δx that results in, say, tick marks at the beginning of each month.

The resolution to this difficulty is to use irregularly spaced tick marks. You can use the SetAxisTicks method to specify the positions and labels for tick marks.

For definiteness, suppose you want to modify Figure 9.21 to display a tick mark at the beginning of each month. How can you accomplish this? Recall that the ReleaseDate variable contains the unformatted numeric values, as discussed in Section 7.10. In particular, you can use SAS/IML statements to find the minimum (earliest) and maximum (latest) release date. You can use those values to enumerate each day between those values. You can then apply a DATEw. format to those numbers and use string matching functions to find the values that correspond to the first day of each month. The following statements carry out this algorithm:

/* plot tick marks at beginning of each month when data has DATE format */
xData = ReleaseDate;                                    /* 5 */
fmt = "DATE7.";

allX = min(xData):max(xData);                           /* 6 */
allXText = putn(allX, fmt);                             /* 7 */

q = substr(allXText, 1, 2);                             /* 8 */
idx = loc(q="01");                                      /* 9 */
pos = allX[idx];
values = allXText[idx];
print (pos[1:5])[label= "pos"] (values[1:5])[label= "value"];
p.SetAxisTicks(XAXIS, pos, values);                     /* 10 */
First Few Values of Tick Marks

Figure 9.22. First Few Values of Tick Marks

The output from the previous statements is shown in Figure 9.22. The following list explains key steps in the algorithm:

  1. For convenience, a new variable xData is created as a copy of the ReleaseDate variable. This step is not necessary, but makes it easier for the reader to use this technique in his own programs.

  2. A numerical vector, allX, is created. This vector contains the numeric value for each day between the minimum and maximum values of xData.

  3. A character vector, allXText, is created that contains the formatted values that correspond to allX. The Base SAS function PUTN applies the DATE7. format to the values in allX.

  4. Which dates correspond to the first of the month? The ones whose formatted values start with the string "01". This step uses the Base SAS function SUBSTR to create a character vector, q, that contains the first two characters of the strings contained in allXText. The values of q are the days of the month: "01", "02", and so forth to "31".

  5. The program uses the LOC function to find the indices of q that match the string "01". These indices are the dates that represent the first day of each month. For convenience, the program prints a few of the unformatted and formatted values for these dates, as shown in Figure 9.22.

  6. The SetAxisTicks method sets the new positions and values for the tick marks of the horizontal axis. The resulting plot is shown in Figure 9.23.

Modified Tick Marks for a Date Variable

Figure 9.23. Modified Tick Marks for a Date Variable

9.9 Case Study: Drawing Arbitrary Figures and Diagrams

Although SAS/IML Studio does not provide a "blank canvas" on which to draw diagrams and other figures, you can use the following technique to simulate a blank drawing area:

  1. Define a DataObject with two observations.

  2. Set the marker color of those observations to NOCOLOR (that is, invisible).

  3. Create a scatter plot of these data.

The result is a blank window on which to draw arbitrary figures. The following statements illustrate this technique by drawing the probability density function for the normal distribution on the interval [—3, 3]. The programs also draws lines that correspond to the 5th and 95th quantiles of the distribution. The resulting plot is shown in Figure 9.24.

/* create "blank canvas"; draw arbitrary shapes and figures */
coords = {-3 0, 3 0.5};            /* window shows [-3,3]x[0,0.5]   */

declare DataObject dobj;
dobj = DataObject.Create("Canvas", {"x" "y"}, coords);
dobj.SetMarkerColor(OBS_ALL, NOCOLOR); /* make markers invisible    */

declare ScatterPlot canvas;
canvas = ScatterPlot.Create(dobj, "x", "y");
canvas.DrawUseDataCoordinates();   /* window range: min/max of data */

x = do(-3.3, 3.3, 0.05);           /* evenly spaced points          */
canvas.DrawLine(x, 0*x);           /* draw reference line at y=0    */
y = pdf("normal", x);              /* evaluate function at x values */
canvas.DrawSetPenColor(BLUE);      /* set pen color to blue         */
canvas.DrawLine(x, y);             /* draw normal curve             */

/* draw lines at certain locations to indicate quantiles */
canvas.DrawSetPenStyle(DASHED);
canvas.DrawSetPenColor(GRAY);
q = {0.05 0.95};                   /* list of quantiles             */
do i = 1 to ncol(q);
   a = quantile("normal", q[i]);              /* find quantile      */
   canvas.DrawLine(a, 0, a, pdf("normal",a)); /* draw dashed line   */
end;
Drawing on an Empty Canvas

Figure 9.24. Drawing on an Empty Canvas

9.10 A Comparison between Drawing in IMLPlus and PROC IML

There is not a one-to-one correspondence between IMLPlus drawing methods and the drawing statements in PROC IML. However, the following table lists the PROC IML statements and similar IMLPlus methods. The IMLPlus methods belong to the Plot class. You can investigate these methods further by choosing Help 9.10 A Comparison between Drawing in IMLPlus and PROC IMLHelp Topics and then by expanding the IMLPlus Class Reference 9.10 A Comparison between Drawing in IMLPlus and PROC IMLPlot section.

Table 9.1. Low-Level Drawing Commands in PROC IML

PROC IML Statement

Similar IMLPlus Method

DISPLAY, WINDOW

N/A, although you can display dialog boxes by calling IMLPlus modules. See Section 5.8.

GOPEN, GCLOSE

Plot.DrawBeginBlock, Plot.DrawEndBlock, Plot.DrawResetState

GDELETE

Plot.DrawRemoveCommands

GPIE

Plot.DrawArc

GDRAW, GDRAWL

Plot.DrawLine

GGRID

Plot.DrawGrid

GPOINT

Plot.DrawMarker

GPOLY

Plot.DrawPolygon

GPORT

Plot.DrawUseNormalizedCoordinates

GPORTPOP

Plot.PopState

GSCRIPT

Plot.DrawText

GSET

Plot.DrawSetBrushColor, Plot.DrawSetBrushStyle, Plot.DrawSetPenAttributes, Plot.DrawSetPenColor, Plot.DrawSetPenStyle, Plot.DrawSetPenWidth, Plot.DrawSetTextSize, Plot.DrawSetTextColor, Plot.DrawSetTextStyle, Plot.DrawSetTextTypeface

GTEXT, GVTEXT

Plot.DrawText, Plot.DrawSetTextAngle

GWINDOW

Plot.DrawUseDataCoordinates

GXAXIS, GYAXIS

Plot axes are drawn automatically. You can change the way that axes are drawn with a number of Plot class methods that start with the "SetAxis" prefix. You can manually draw additional axes with Plot.DrawAxis and Plot.DrawNumericAxis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset