CHAPTER  8

Visualizing Your Data

8.1    Concepts of ODS Graphics

8.2    Creating Bar Charts with PROC SGPLOT

8.3    Creating Histograms and Density Curves with PROC SGPLOT

8.4    Creating Box Plots with PROC SGPLOT

8.5    Creating Scatter Plots with PROC SGPLOT

8.6    Creating Series Plots with PROC SGPLOT

8.7    Creating Fitted Curves with PROC SGPLOT

8.8    Controlling Axes and Reference Lines in PROC SGPLOT

8.9    Controlling Legends and Insets in PROC SGPLOT

8.10  Customizing Graph Attributes in PROC SGPLOT

8.11  Creating Paneled Graphs with PROC SGPANEL

8.12  Specifying Image Properties and Saving Graphics Output

 

8.1     Concepts of ODS Graphics

ODS Graphics is designed to give you high-quality graphs with a minimum of effort. By its nature, ODS graphics is a big topic. You have many choices to make about types of graphs, file formats, and options for color, line type, tick marks, and so on. This chapter covers the basic statements and options to get you started.

As you might expect, ODS Graphics is an extension of the Output Delivery System, but instead of creating tabular output, ODS Graphics creates graphs, and it produces them using the same destinations and styles as ODS tabular output. ODS Graphics is different from SAS/GRAPH, which is licensed separately. Starting with SAS 9.3, ODS Graphics is part of Base SAS, so you do not need to license any additional products.

Using ODS Graphics in statistical procedures  Over 80 statistical procedures have the ability to produce graphs using ODS Graphics. When you run these procedures, ODS Graphics will produce graphs that are specially designed for that type of analysis. Starting with SAS 9.3, ODS Graphics is turned on by default when you run SAS interactively in Microsoft Windows and UNIX. ODS Graphics is turned off by default when you run in batch mode or in other operating environments. To turn this feature on, insert this statement into your program before any statistical procedures:

ODS GRAPHICS ON;

Statistical procedures that support ODS Graphics will then create appropriate graphs.

You do not need to turn ODS Graphics off, but if you want to turn it off (either to make your programs run faster, or simply because you do not want the graphs) use this statement:

ODS GRAPHICS OFF;

Note that the ODS GRAPHICS statement does not open a destination (like ODS HTML or ODS PDF statements do). You open and close ODS destinations; you turn ODS GRAPHICS on or off.

Using ODS Graphics for stand-alone graphs  ODS Graphics also includes a family of procedures designed to create stand-alone graphs (graphs that are not embedded in the output of a statistical procedure). The SGPLOT and SGPANEL procedures are two of these. Because these procedures always produce graphs, you do not need to specify the ODS GRAPHICS ON statement even in batch mode. However, there may still be times when you want to use the ODS GRAPHICS statement to specify graphics options. (See Section 8.12.)

The SGPLOT procedure creates single-celled graphs while SGPANEL creates multicelled graphs based on classification variables. The various types of graphs fall into four general categories:

Category

Types of Graphs

Basic plots

band, block, bubble, fringe, heat map, high-low, needle, scatter, series, spline, step, and vector

Fit and confidence plots

ellipse, loess, penalized B-spline, and regression

Distribution plots

box, density, and histogram

Categorization plots

bar, dot, line, and waterfall

You can overlay multiple graphs as long as combining them makes sense. This chapter covers the most common types of graphs. Other graphs use similar syntax and options.

ODS destinations  Unless you specify otherwise, your graphs will be rendered in your default destination. HTML is the default destination for SAS Studio. HTML is also the default destination for SAS Enterprise Guide starting with release 8.1 (earlier versions use SASREPORT). Likewise, HTML is the default destination for the SAS windowing environment in Microsoft Windows and UNIX starting with SAS 9.3 (earlier versions use the LISTING destination). If you run in batch or in other operating environments, the default destination is LISTING. See Chapter 5 for information about controlling destinations.

Saving graphs  Also starting with SAS 9.3, graphs are written in your WORK library and will therefore be deleted when you exit SAS. This is good because it prevents your disks from becoming cluttered with old graphs. For information about how to save graphs, see Section 8.12.

Styles for graphs  ODS style templates control the overall appearance of your output. You can use the same style templates for graphs as for tabular output. However, some styles are better suited to statistical graphics than others. The following table lists styles that are recommended for graphical results:

Desired Output

Style Name

Default for Destination

Color

ANALYSIS

HTMLBLUE

LISTING

PEARL

RTF

STATISTICAL

 

HTML

LISTING (graphs only)

PDF, PS

RTF

Gray scale

JOURNAL

 

Black and white

JOURNAL2

 

You can change the default style for some destinations using menus in SAS Studio, SAS Enterprise Guide, or the SAS windowing environment. You can also specify a style for your graphs using the STYLE= option in the ODS statement for a destination. For example, to produce a gray-scale graph in the LISTING destination (and save it in a default location) you would use this statement:

ODS LISTING STYLE = JOURNAL;

For the LISTING destination, the STYLE= option applies only to graphical output; tabular output is still rendered as plain text. Also keep in mind that every destination has a default style associated with it, so if you change the destination for a graph, its appearance may change too. For more about specifying image properties and saving graphs, See Section 8.12.

 

8.2     Creating Bar Charts with PROC SGPLOT

Bar charts show the distribution of a categorical variable where the length of each bar is proportional to the number of observations in that category. To create a chart with vertical bars, use the SGPLOT procedure with a VBAR statement with this general form:

PROC SGPLOT;

   VBAR variable-name / options;

For horizontal bars, replace the keyword VBAR with HBAR. Possible options include:

ALPHA = n

specifies the level for confidence limits. The value of n must be between 0 (100% confidence) and 1 (0% confidence). The default is 0.05 (95% confidence limits).

BARWIDTH = n

sets the width of bars. Values range from 0 to 1 with a default of 0.8.

DATALABEL = variable-name

displays a label for each bar. If you specify a variable name, then the values of that variable will be used. Otherwise, SAS will calculate appropriate values.

DISCRETEOFFSET = n

offsets bars from midpoints, which is useful for overlaying bar charts. The value must be between 0.5 (left) and +0.5 (right). The default is 0 (no offset).

LIMITSTAT = statistic

specifies the type of limit lines to be shown. Possible values are CLM, STDDEV (standard deviation), or STDERR (standard error). You must specify a RESPONSE= option and STAT=MEAN. To display limits when using the GROUP= option, you must also specify GROUPDISPLAY=CLUSTER.

MISSING

includes a bar for missing values.

GROUP = variable-name

specifies a variable used to group the data.

GROUPDISPLAY = type

specifies how to display grouped bars, either STACK (the default) or CLUSTER.

RESPONSE = variable-name

specifies a numeric variable to be summarized.

STAT = statistic

specifies a statistic, either FREQ, MEAN, MEDIAN, PERCENT, or SUM. FREQ is the default if there is no response variable. SUM is the default when you specify a response variable.

TRANSPARENCY = n

specifies the degree of transparency for the bars. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 being completely opaque.

Example  A chocolate manufacturer is considering whether to add four new varieties of chocolate to its line of products. The company asked volunteers to taste the new flavors. The data contain each person’s age group (A for adult, C for child) followed by their favorite flavor (80%Cacao, Earl Grey, Ginger, or Pear). Notice that each line of data contains six responses.

A Pear A 80%Cacao A EarlGrey C 80%Cacao A Ginger C Pear

C 80%Cacao C Pear C Pear A EarlGrey A 80%Cacao C 80%Cacao

A Ginger A Pear C EarlGrey C 80%Cacao A 80%Cacao A EarlGrey

A 80%Cacao C Pear C Pear A 80%Cacao C Pear C 80%Cacao

The following program reads the raw data and creates a user-defined format. Then PROC SGPLOT creates a bar chart using the GROUP= and GROUPDISPLAY= options.

DATA chocolate;

   INFILE 'c:MyRawDataChoc.dat';

   INPUT AgeGroup $ FavoriteFlavor $ @@;

RUN;

PROC FORMAT;

   VALUE $AgeGp 'A' = 'Adult' 'C' = 'Child';

RUN;

* Bar chart for favorite flavor;

PROC SGPLOT DATA = chocolate;

   VBAR FavoriteFlavor / GROUP = AgeGroup GROUPDISPLAY = CLUSTER;

   FORMAT AgeGroup $AgeGp.;   

   LABEL FavoriteFlavor = 'Flavor of Chocolate';

   TITLE 'Favorite Chocolate Flavors by Age';

RUN;

This chart has clustered bars showing the number of respondents in each age group who chose each flavor. The LABEL statement replaced the name of the variable FavoriteFlavor with the words "Flavor of Chocolate" in the X-axis label. The FORMAT statement replaced the data values (A and C) with more descriptive values (Adult and Child) in the legend.

image

8.3     Creating Histograms and Density Curves with PROC SGPLOT

The bar charts in the preceding section show the distribution of categorical data. To show the distribution of continuous data, you can use histograms (or box plots, which are described in the next section). In a histogram, the data are divided into discrete intervals called bins. Each bin is represented by a rectangle, which makes histograms look similar to bar charts. However, bar charts typically have a gap between the bars while histograms do not.

Histograms  To create a histogram, use PROC SGPLOT and a HISTOGRAM statement with this general form:

HISTOGRAM variable-name / options;

Possible options include:

BINSTART = n

specifies the midpoint for the first bin.

BINWIDTH = n

specifies the bin width (in units of the horizontal axis). SAS determines the number of bins. This option is ignored if you specify the NBINS= option.

GROUP = variable-name

specifies a variable used to group the data.

NBINS = n

specifies the number of bins. SAS determines the bin width.

SCALE = scaling-type

specifies the scale for the vertical axis, either PERCENT (the default), COUNT, or PROPORTION.

SHOWBINS

places tick marks at the midpoints of the bins. By default, tick marks are placed at regular intervals based on minimum and maximum values.

TRANSPARENCY = n

specifies the degree of transparency for the histogram. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

Density curves  You can also plot density curves for your data. The general form of a DENSITY statement is:

DENSITY variable-name / options;

Common options are:

GROUP = variable-name

specifies a variable used to group the data.

TYPE = distribution-type

specifies the type of distribution curve, either NORMAL (the default) or KERNEL.

TRANSPARENCY = n

specifies the degree of transparency for the density curve. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

The HISTOGRAM and DENSITY statements can be used together. Keep in mind that when you overlay graphs, the order of statements is important because the second graph will be drawn on top of the first and could hide it.

Example  A fourth grade class has a competition to see who can read the most books in one month. For each student, the teacher records the student's name and the number of books read. Notice that each line of data includes six students.

Bella 4 Anthony  9 Joe 10 Chris 6 Beth 5 Daniel 2

David 7 Emily 7 Josh 7 Will 9 Olivia 7 Matt 8

Maddy 8 Sam 13  Jessica 6 Jose 6 Mia 12 Elliott 8

Tyler 15 Lauren 10 Cate 14 Ava 11 Mary 9 Eric 10

Megan 13 Michael 9 John 18 Alex 5 Cody 11 Amy 4

The DATA step below reads the raw data from a file named Reading.dat. Then an  SGPLOT procedure creates a histogram for the number of books. The bins will have a width of two, the horizontal axis will have a tick mark at the center of each bin, and the vertical axis will show the count (in this case, the number of students). Two density distributions will be overlaid on the histogram: the normal distribution and the kernel density estimate.

DATA contest;

   INFILE 'c:MyRawDataReading.dat';

   INPUT Name $ NumberBooks @@;

RUN;

PROC SGPLOT DATA = contest;

   HISTOGRAM NumberBooks / BINWIDTH = 2 SHOWBINS SCALE = COUNT;

   DENSITY NumberBooks;

   DENSITY NumberBooks / TYPE = KERNEL;

   TITLE 'Reading Contest';

RUN;

Here is the graph of books read (shown in the JOURNAL style):

image

8.4     Creating Box Plots with PROC SGPLOT

Like histograms, box plots show the distribution of continuous data. This type of graph is also called a box-and-whisker plot because of the way it looks. Every part of a box plot tells you something about the distribution of your data.

image

The ends of the box indicate the 25th and 75th percentiles (also called the interquartile range). The line inside the box indicates the 50th percentile (the median), and the marker indicates the mean. By default, the whiskers cannot be longer than 1.5 times the length of the box. Any points beyond the whiskers are considered outliers and are marked with circles. If you specify the EXTREME option, then the whiskers will extend the entire range.

To create a vertical box plot, use PROC SGPLOT and a VBOX statement, like this:

VBOX variable-name / options;

For horizontal box plots, replace the keyword VBOX with HBOX. Possible options include:

CATEGORY = variable-name

specifies a categorical variable. One box plot will be created for each value of this variable.

EXTREME

specifies that the whiskers should extend to the true minimum and maximum values so outliers will not be identified.

GROUP = variable-name

specifies a second categorical variable. One box plot will be created for each value of this variable within the categorical variable.

MISSING

includes a box for missing values for the group or category variable.

TRANSPARENCY = n

specifies the degree of transparency for the box plot. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 being completely opaque.

 

Example  A small town sponsors an annual bicycle criterium. That’s a race where bicyclists go round and round a loop. The racers compete in three divisions: Youth, Adult, and Masters. The data include each bicyclist’s division and the number of laps they completed in one hour. Notice that each line of data contains results for five competitors.

Adult   44 Adult   33 Youth   33 Masters 38 Adult   40

Masters 32 Youth   32 Youth   38 Youth   33 Adult   47

Masters 37 Masters 46 Youth   34 Adult   42 Youth   24

Masters 33 Adult   44 Youth   35 Adult   49 Adult   38

Adult   39 Adult   42 Adult   32 Youth   42 Youth   31

Masters 33 Adult   33 Masters 32 Youth   37 Masters 40

The DATA step below reads the raw data from a file named Criterium.dat. Then an SGPLOT procedure creates vertical box plots of the number of laps. The CATEGORY= option tells SAS to create a separate box plot for each division.

DATA bikerace;

   INFILE 'c:MyRawDataCriterium.dat';

   INPUT Division $ NumberLaps @@;

RUN;

* Create box plot;

PROC SGPLOT DATA = bikerace;

   VBOX NumberLaps / CATEGORY = Division;

   TITLE 'Bicycle Criterium Results by Division';

RUN;

Here is the box plot showing the number of laps by division:

image

8.5     Creating Scatter Plots with PROC SGPLOT

Scatter plots are an effective way to show the relationship between two continuous variables. For experimental data, the independent variable is traditionally assigned to the horizontal axis while the dependent variable is assigned to the vertical axis. To create scatter plots, use PROC SGPLOT and a SCATTER statement, like this:

SCATTER X=horizontal-variable Y=vertical-variable / options;

Possible options include:

DATALABEL = variable-name

displays a label for each data point. If you specify a variable name, the values of that variable will be used as labels. If you do not specify a variable name, then the values of the Y variable will be used.

GROUP = variable-name

specifies a variable to be used for grouping data.

JITTER

offsets the data markers slightly when multiple observations have the same response value.

NOMISSINGGROUP

specifies that observations with missing values for the group variable should not be included.

TRANSPARENCY = n

specifies the degree of transparency for the markers. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

Example  To illustrate the use of scatter plots, here are data about birds. For each species, there are four variables: name, type (S for songbirds or R for raptors), length in cm (from tip of beak to tip of tail), and wingspan in cm. Note that each line of data includes several birds.

Robin        S  28  41 Bald Eagle   R 102 244 Barn Owl     R  50 110

Osprey       R  66 180 Cardinal     S  23  31 Goldfinch    S  11  19

Golden Eagle R 100 234 Crow         S  53 100 Magpie       S  60  60

Elf Owl      R  15  27 Condor       R 140 300

The following program creates a permanent SAS data set named WINGS in the MySASLib directory on the C drive. Then the program reads the data and produces a scatter plot grouped by type. Since the values S and R are not very descriptive, PROC FORMAT is used to create a user-defined format that is then specified in a FORMAT statement in the SGPLOT procedure to change S to Songbirds and R to Raptors.

 

LIBNAME flight 'c:MySASLib';

DATA flight.wings;

   INFILE 'c:MyRawDataBirds.dat';

   INPUT Name $12. Type $ Length Wingspan @@;

RUN;

* Plot Wingspan by Length;

PROC FORMAT;

   VALUE $birdtype

      'S' = 'Songbirds'

      'R' = 'Raptors';

RUN;

PROC SGPLOT DATA = flight.wings;

   SCATTER X = Wingspan Y = Length / GROUP = Type;

   FORMAT Type $birdtype.;

   TITLE 'Comparison of Wingspan vs. Length';

RUN;

Here is the scatter plot (shown in the JOURNAL style):

image

8.6     Creating Series Plots with PROC SGPLOT

A series plot is similar to a scatter plot except that instead of marking each data point, SAS connects the data points with a line. Series plots make sense whenever data must be displayed in a particular order. Dates and times of any kind are good candidates for series plots. To create a series plot, use PROC SGPLOT and a SERIES statement with this general form:

SERIES X = horizontal-variable Y = vertical-variable / options;

Possible options include:

CURVELABEL = 'text-string'

adds a label for the curve. If you do not specify a text string, then SAS uses the label from the Y variable.

DATALABEL = variable-name

displays a label for each data point. If you specify a variable name, the values of that variable will be used as labels. If you do not specify a variable name, then the values of the Y variable will be used.

GROUP = variable-name

specifies a variable to be used for grouping the data. A separate line is created for each unique value of the grouping variable.

MARKERS

adds a marker for each data point.

NOMISSINGGROUP

specifies that observations with missing values for the group variable should not be included.

TRANSPARENCY = n

specifies the degree of transparency for plot line. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

Note that SAS will connect the points in the order in which they appear in the data set. To have the points connected properly, your data must be sorted by the horizontal variable. If your data are not already sorted, then use PROC SORT before plotting your data.

Example  A technical writer collects data about her use of electricity for one day. Each hour she checks her meter and records the number of kilowatt hours used since the last reading. The data include the time (on a 24-hour clock) and the number of kilowatt hours. Note that each line of data contains six readings.

0 .22 1 .15 2 .17 3 .18 4 .19 5 .23

6 .5 7 .63 8 .61 9 .6 10 .48 11 .45

12 .44 13 .44 14 .39 15 .35 16 .42 17 .47

18 .7 19 .66 20 .7 21 .69 22 .6 23 .4

She could use a scatter plot to display these data, but since one of the variables is time, using a series plot makes more sense. The following program reads the data from a raw data file named Hourly.dat and creates a SAS data set named ELECTRICITY. Then the program creates a series plot of time versus kilowatt hours. The MARKERS option tells SAS to add a marker for each data point on the line.

DATA electricity;

   INFILE 'c:MyRawDataHourly.dat';

   INPUT Time kWh @@;

RUN;

* Plot temperatures by time;

PROC SGPLOT DATA = electricity;

   SERIES X = Time Y = kWh / MARKERS;

   TITLE 'Hourly Use of Electricity';

RUN;

The plot looks like this:

image

Note that the data did not need to be sorted because they were already ordered by the variable on the X-axis (Time).

8.7     Creating Fitted Curves with PROC SGPLOT

Scatter plots show the relationship between two variables. One way to explore that relationship further is to plot a fitted curve. The SGPLOT procedure produces several kinds of fitted curves including regression lines, loess curves, and penalized B-spline curves. To create any of these types of fitted curves, use a statement with this general form:

statement-name X = horizontal-variable Y = vertical-variable / options;

Where the statement-name can be:

REG

regression line or curve

LOESS

loess curve

PBSPLINE

penalized B-spline curve

Options for fitted curves include:

ALPHA = n

specifies the level for the confidence limits. The value of n must be between 0 (100% confidence) and 1 (0% confidence). The default is  0.05 (95% confidence limits).

CLI

adds prediction limits for individual predicted values  (for REG and PBSPLINE only).

CLM

adds confidence limits for mean predicted values.

CURVELABEL = 'text'

adds a label for the curve. If you do not specify a text string, then SAS uses the label from the Y variable.

GROUP = variable-name

specifies a variable to be used for grouping the data. A separate line is created for each unique value of the grouping variable.

NOLEGCLI

removes the legend entry for the CLI band.

NOLEGCLM

removes the legend entry for the CLM band.

NOLEGFIT

removes the legend entry for the fit curve.

NOMARKERS

removes markers for data points.

CLMTRANSPARENCY = n

specifies the degree of transparency for the confidence limits. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

TRANSPARENCY = n

specifies the degree of transparency for the plot line. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

Each type of fitted curve offers additional options for controlling the parameters for the interpolation. See the SAS Documentation for more information.

Example  A runner decides that she would like to improve her time in the 1500 meter run. In order to track her progress, she records her best time each week. The data values are the week (1 to 28) and time in seconds. Note that each line of data contains several observations.

1 546 2 492 3 490 4 504 5 486 6 474 7 484

8 468 9 466 10 462 11 456 12 460 13 450 14 442

15 432 16 436 17 430 18 432 19 438 20 436 21 426

22 432 23 440 24 432 25 424 26 428 27 426 28 430

This program reads the data and overlays two fitted curves for the times: a loess plot and a regression plot. The CLM option in the LOESS statement generates the 95% confidence limit band for the mean predicted values, while the NOLEGCLM option tells SAS not to include the confidence limit band in the legend.

DATA weekly1500;

   INFILE 'C:MyRawDataWeekly1500.dat';

   INPUT Week Time @@;

RUN;

PROC SGPLOT DATA = weekly1500;

   LOESS X = Week Y = Time / NOMARKERS CLM NOLEGCLM;

   REG X = Week Y = Time;

   LABEL Time = 'Time in Seconds';

   TITLE 'Times for 1500 Meter Run';

RUN;

Here is the plot of run times (shown in the JOURNAL style):

image

This graph shows the data points with both a regression line and a loess curve. Because the NOMARKERS option was included in the LOESS statement, the data points are plotted only once. The confidence limits for mean predicted values, based on the loess fit, are shown by the gray band surrounding the loess line.

8.8     Controlling Axes and Reference Lines in PROC SGPLOT

Statements like VBAR and LOESS tell SAS the type of graph to create. However, the SGPLOT procedure also has supporting statements that allow you to control other features of your graph, such as axes and reference lines.

Axes  To specify options for the horizontal axis, use a statement with this general form:

XAXIS options;

For the vertical axis, replace the keyword XAXIS with YAXIS. Options include:

GRID

creates a line at each tick mark on the axis.

LABEL = 'text-string'

specifies a text string enclosed in quotes to be used as the label for the axis. You can also use an ordinary LABEL statement, but a label specified using an AXIS statement will override one from any other source. If there is no variable label, then SAS uses the variable name.

TYPE = axis-type

specifies the type of axis. DISCRETE is the default for character variables. LINEAR is the default for numeric variables. TIME is the default for variables that have date, time, or datetime formats associated with them. LOG specifies a logarithmic scale.

VALUES = (values-list)

specifies values for tick marks on axes. Values must be enclosed in parentheses, and can be specified either as a list (0 5 10 15 20) or a range (0 TO 20 BY 5).

Reference lines  Adding reference lines to a graph shows which points are above or below important levels. To add a horizontal or vertical reference line, use a REFLINE statement.

REFLINE values / options;

The values are the points at which the reference lines should be drawn. You can specify values as a list, 0 5 10 15 20; a range, 0 TO 20 BY 5; or the name of a variable whose values will be used. Options include:

AXIS = axis

specifies the axis that contains the reference line values, either X or Y. The default is the Y-axis.

LABEL = (label-list)

specifies one or more text strings (each enclosed in quotes and separated by spaces) to be used as labels for the reference lines.

TRANSPARENCY = n

specifies the degree of transparency for the reference line. The value of n must be between 0 (the default) and 1, with 1 being completely transparent and 0 completely opaque.

If the REFLINE statement comes before any plot statements, then the line will be drawn behind the plot elements. If it comes afterward, then the line will be drawn in front of the plot elements.

Example  This example compares average high temperatures in three cities: International Falls, Minnesota; Raleigh, North Carolina; and Yuma, Arizona. The variables are month and the high temperatures for each city. Temperatures are in Fahrenheit. There are three months in each line.

1  12.2 50.7  68.5  2  20.1 54.5  74.1  3  32.4 63.7  79.0  

4  49.6 72.7  86.7  5  64.4 79.7  94.1  6  73.0 85.8 103.1

7  78.1 88.7 106.9  8  75.6 87.4 105.6  9  64.0 82.6 101.5  

10 52.2 72.9  91.0  11 32.5 63.9  77.5  12 17.8 54.1  68.9

The following program has three SERIES statements, one for each city. Reference lines will be drawn at 32 and 75 degrees. In this data set, the variable Month is simply a number (1–12) rather than a SAS date value. In order to avoid having values of month, such as 3.5, the axis type has been set to DISCRETE using an XAXIS statement. A YAXIS statement specifies an axis label.

DATA cities;

   INFILE 'c:MyRawDataThreeCities.dat';

   INPUT Month IntFalls Raleigh Yuma @@;

RUN;

* Plot average high and low temperatures by city;

PROC SGPLOT DATA = cities;

   SERIES X = Month Y = IntFalls;

   SERIES X = Month Y = Raleigh;

   SERIES X = Month Y = Yuma;

   REFLINE 32 75 / LABEL = ('32 degrees' '75 degrees') TRANSPARENCY = 0.5;

   XAXIS TYPE = DISCRETE;

   YAXIS LABEL = 'Average High Temperature (F)';

   TITLE 'Temperatures for International Falls, Raleigh, and Yuma';

RUN;

Here is the plot of temperatures (shown in the JOURNAL style):

image

 

8.9     Controlling Legends and Insets in PROC SGPLOT

The SGPLOT procedure generates legends automatically for your plots when appropriate. This is great because then you don’t have to think about them. But sometimes you may want to remove the legend, or move it to a different place, or add a note or comment of your own.

Changing legends  You can change many aspects of a legend using the KEYLEGEND statement with this general form:

KEYLEGEND / options;

Options for legends include:

ACROSS = n

specifies the number of columns in the legend.

DOWN = n

specifies the number of rows in the legend.

LOCATION = value

specifies the location for the legend, either INSIDE the axis area or OUTSIDE (the default).

NOBORDER

removes the border around the legend.

POSITION = value

specifies the position of the legend, either TOP, TOPLEFT, TOPRIGHT, BOTTOM (the default), BOTTOMLEFT, BOTTOMRIGHT, LEFT, or RIGHT.

Removing legends  Sometimes you don't want a legend. To remove it, simply add the option NOAUTOLEGEND to the PROC SGPLOT statement.

PROC SGPLOT DATA = data-set NOAUTOLEGEND;

If you have both a KEYLEGEND statement and the NOAUTOLEGEND option, then the NOAUTOLEGEND option will be ignored.

Adding insets  To place text in the axis area use an INSET statement with this general form:

INSET 'text-string-1' 'text-string-2' ... 'text-string-n' / options;

If you have more than one text string, then the strings will be placed one below the other. Options for insets include:

BORDER

adds a border.

POSITION = value

specifies the position of the inset, either TOP, TOPLEFT, TOPRIGHT, BOTTOM (the default), BOTTOMLEFT, BOTTOMRIGHT, LEFT, or RIGHT.

Example  This example uses the permanent SAS data set created in Section 8.5. The data are about birds. For each species, there are four variables: name, type (S for songbirds or R for raptors), length in cm (from tip of beak to tip of tail), and wingspan in cm.

 

In this program, a SCATTER statement plots wingspan versus length. The KEYLEGEND statement specifies that the legend should be located inside the graph area in the bottom right corner. The INSET statement places a note in the top left corner of the graph.

LIBNAME flight 'c:MySASLib';

* Plot Wingspan by Length;

PROC FORMAT;

   VALUE $birdtype

      'S' = 'Songbirds'

      'R' = 'Raptors';

RUN;

PROC SGPLOT DATA = flight.wings;

   SCATTER X = Wingspan Y = Length / GROUP = Type;

   KEYLEGEND / LOCATION = INSIDE POSITION = BOTTOMRIGHT;

   INSET 'Birds of North America' / POSITION = TOPLEFT;

   FORMAT Type $birdtype.;

   TITLE 'Comparison of Wingspan vs. Length';

RUN;

Here is the plot with an inset and a new legend (shown in the JOURNAL style):

image

 

8.10   Customizing Graph Attributes in PROC SGPLOT

When you create graphs, you want them to be attractive and easy to read. That's why SAS has style templates that have been designed specifically for use with graphs (listed in Section 8.1). Still, there may be times when you want stars instead of circles, or thicker lines, or a different color. Fortunately, the SGPLOT procedure includes options for controlling graph attributes. To use these options, put them after a slash at the end of a basic plot statement. For example, this SCATTER statement tells SAS to use the STAR symbol for markers:

SCATTER X = Score Y = HoursOfStudy / MARKERATTRS = (SYMBOL = STAR);

There are many options for controlling graph attributes. Some common ones are:

FILLATTRS = (attribute = value)

specifies the appearance of a filled area. Attributes include COLOR=.

LABELATTRS = (attribute = value)

specifies the appearance of axis labels. Attributes include COLOR=, SIZE=, STYLE=, and WEIGHT=.

LINEATTRS = (attribute = value)

specifies the appearance of a line. Attributes are COLOR=, PATTERN=, and THICKNESS=.

MARKERATTRS = (attribute = value)

specifies the appearance of a marker. Attributes include COLOR=, SIZE=, and SYMBOL=.

VALUEATTRS = (attribute = value)

specifies the appearance of axis tick labels. Attributes include COLOR=, SIZE=, STYLE=, and WEIGHT=.

Each attribute has many possible values. Here are just a few:

Attribute

Possible Values

COLOR=

RGB notation such as #FF0000 (red) or named values such as RED, plus many others

PATTERN=

SOLID, DASH, SHORTDASH, LONGDASH, DOT, DASHDASHDOT, or DASHDOTDOT

SIZE=

numbers with the units CM, IN, MM, PCT, PT, or PX (the default)

STYLE=

ITALIC or NORMAL (the default)

SYMBOL=

CIRCLE, CIRCLEFILLED, DIAMOND, DIAMONDFILLED, PLUS, SQUARE, SQUAREFILLED, STAR, STARFILLED, TRIANGLE, or TRIANGLEFILLED

THICKNESS=

numbers with the units CM, IN, MM, PCT, PT, or PX (the default)

WEIGHT=

BOLD or NORMAL

Of course, not all types of plots support all graph attributes. For example, you cannot use FILLATTRS= with a scatter plot because scatter plots don't have any filled areas. There are additional graph attributes. For a complete list, check the SAS Documentation.

Example  This example uses the permanent SAS data set created in Section 8.5. The data are about birds. For each species, there are four variables: name, type (S for songbirds or R for raptors), length in cm (from tip of beak to tip of tail), and wingspan in cm.

The following program produces two plots. First, a REG statement plots a regression line for wingspan versus length with a line that is 2mm thick and 75% transparent. Then a SCATTER statement plots the data points using a filled circle 2mm in size. The axis labels and title are bold.

LIBNAME flight 'c:MySASLib';

* Plot Wingspan by Length;

PROC SGPLOT DATA = flight.wings NOAUTOLEGEND;

   REG X = Wingspan Y = Length /

      LINEATTRS = (THICKNESS = 2MM) TRANSPARENCY = .75;

   SCATTER X = Wingspan Y = Length /

      MARKERATTRS = (SYMBOL = CIRCLEFILLED SIZE = 2MM);

   TITLE BOLD 'Birds of North America';

   XAXIS LABEL = 'Wingspan (in cm)' LABELATTRS = (WEIGHT = BOLD);

   YAXIS LABEL = 'Body Length (in cm)' LABELATTRS = (WEIGHT = BOLD);

RUN;

Here is the graph with new attributes:

image

 

8.11   Creating Paneled Graphs with PROC SGPANEL

The SGPANEL procedure is a close cousin of the SGPLOT procedure. The SGPANEL procedure produces nearly all the same types of graphs as the SGPLOT procedure, but while SGPLOT produces single-celled graphs, SGPANEL can produce multicelled graphs. PROC SGPANEL produces a separate cell for each combination of values of the classification variables that you specify. Each of those cells uses the same variables on their X- and Y-axes.

The syntax for PROC SGPANEL is almost identical to PROC SGPLOT, so it is easy to convert one to the other by making just a couple of changes to your code. You simply replace the keyword SGPLOT with SGPANEL and add a PANELBY statement, like this:

PROC SGPANEL;

   PANELBY variable-list / options;

   plot-statement;

The PANELBY statement must appear before any statements that create plots. Possible options include:

COLUMNS = n

specifies the number of columns in the panel.

MISSING

specifies that observations with missing values for the PANELBY variable should be included.

NOVARNAME

removes the variable name from cell headings.

NOHEADERBORDER

removes the border from the cell headings.

ROWS = n

specifies the number of rows in the panel.

SPACING = n

specifies the number of pixels between rows and columns in the panel. The default is 0.

UNISCALE = value

specifies which axes will share the same range of values. Possible values are COLUMN, ROW, and ALL (the default).

Instead of XAXIS and YAXIS statements, the SGPANEL procedure uses COLAXIS and ROWAXIS statements to control axes. See Section 8.8 for axis options.

Example  This example uses the permanent SAS data set created in Section 8.5. The data are about birds. For each species, there are four variables: name, type (S for songbirds or R for raptors), length in cm (from tip of beak to tip of tail), and wingspan in cm.

The following program produces a paneled plot. This plot is similar to the grouped plot shown in Section 8.5. In that graph, data for songbirds and raptors are overlaid in a single cell. In this example, the two groups are plotted in separate cells. The NOVARNAME option removes the word "Type=" from the column headings, and the SPACING= option inserts a little space between the two cells. This example also uses PROC FORMAT to create a user-defined format for the variable Type so that the column headings are words instead of the coded values R and S.

 

LIBNAME flight 'c:MySASLib';

* Plot Wingspan by Length;

PROC FORMAT;

   VALUE $birdtype

      'S' = 'Songbirds'

      'R' = 'Raptors';

RUN;

PROC SGPANEL DATA = flight.wings;

   PANELBY Type / NOVARNAME SPACING = 5;

   SCATTER X = Wingspan Y = Length;

   FORMAT Type $birdtype.;

   TITLE 'Comparison of Wingspan vs. Length';

RUN;

Here is the graph with a panel for each type of bird:

image

Note that you could have used a standard BY statement with PROC SGPLOT instead of a PANELBY statement with PROC SGPANEL, but then SAS would have produced two completely separate graphs (one for raptors and another for songbirds) instead of two cells within a single graph. Also, when you use a BY statement, the data must be presorted by the values of the BY variables, but when you use a PANELBY statement, the data do not need to be sorted.

 

8.12   Specifying Image Properties and Saving Graphics Output

If you are writing a paper or creating a presentation, you may need to access individual graphs. You may be able to simply copy and paste images when you view them in SAS, and you can always save or download your results in formats like HTML, PDF, or RTF. Sometimes that may be all you need. However, at other times you may want to specify the properties of your graphs or save them in separate graphics files for later use.

Specifying properties of images  To specify properties for your images, use the ODS GRAPHICS statement with this general form:

ODS GRAPHICS / options;

Options include:

HEIGHT = n

specifies the image height in CM, IN, MM, PT, or PX.

IMAGENAME = 'filename'

specifies the base filename for the image. The default name for an image file is the name of its ODS output object. See Section 5.12 for a discussion of output objects.

OUTPUTFMT = file-type

specifies the graph format. The default varies by destination. Possible values include BMP, GIF, JPEG, PDF, PNG, PS, SVG, TIFF, and many others.

RESET

resets options to their defaults.

WIDTH = n

specifies the image width in CM, IN, MM, PT, or PX.

In most cases, the default size for graphs is 640 pixels wide by 480 pixels high. If you specify only one dimension (width but not height, or vice versa), then SAS will adjust the other dimension to maintain a default aspect ratio of 4:3.

When you save image files, SAS will append numerals to the end of the image name. For example, if you specify an image name of Final, then your files will be named Final, Final1, Final2, and so on. If you rerun your code, SAS will, by default, continue counting so that the new files will not overwrite the old. Specifying the RESET option before the IMAGENAME= option tells SAS to start over each time.

Saving graphical output  LISTING is a good destination for capturing individual graphs since it offers the most image formats and saves images in separate files. To create stand-alone graphs, use an ODS LISTING statement with a GPATH= option, like this:

ODS LISTING GPATH = 'path' options;

where path is the location where your image should be saved. Options include:

IMAGE_DPI = n

specifies the image resolution. The default is 96.

STYLE = style-name

specifies a style template. See Section 8.1 for a list of possible styles.

This statement would save ODS Graphics images in individual files in a folder named MyGraphs on the C drive using the STATISTICAL style and 300 dots per inch:

ODS LISTING GPATH = 'c:MyGraphs' STYLE = STATISTICAL IMAGE_DPI = 300;

Example  This example uses the permanent SAS data set created in Section 8.5. The data are about birds. For each species, there are four variables: name, type (S for songbirds or R for raptors), length in cm (from tip of beak to tip of tail), and wingspan in cm.

The following program produces a scatter plot, and sends it to the ODS LISTING destination using the JOURNAL style. This graph will be saved in the MyGraphs directory on the C drive, be named BirdGraph, be in BMP format, and be two inches high and three inches wide. The final ODS GRAPHICS statement resets all the image properties back to the default values so that any subsequent graphs you generate will be normal.

LIBNAME flight 'c:MySASLib';

* Create BMP image of Wingspan by Length;

ODS LISTING GPATH = 'c:MyGraphs' STYLE = JOURNAL;

ODS GRAPHICS / RESET IMAGENAME = 'BirdGraph' OUTPUTFMT = BMP

   HEIGHT = 2IN WIDTH = 3IN;

PROC SGPLOT DATA = flight.wings;

   SCATTER X = Wingspan Y = Length;

   TITLE 'Comparison of Wingspan vs. Length';

RUN;

ODS GRAPHICS / RESET;

Here is the new graph:

image

Note that the file named BirdGraph will not open automatically, but you can navigate to the image file in your operating environment and open it using a viewer designed for that type of file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset