Chapter 3: Clinical Graphs Using the SAS 9.3 SGPLOT Procedure

When the SG procedures were first released with SAS 9.2, the underlying GTL functionality was focused on providing the features necessary to create automatic graphs from the SAS analytical procedures. Many features necessary to make clinical graphs easier were not available. So, although many simpler graphs can be built using SAS 9.2, it is not the best platform for clinical graphs.

With SAS 9.3, the SGPLOT procedure supports many useful features that enable the creation of clinical graphs. These include the following statements and features:

• Highlow plots, bubble plots, and parametric lines.

• Cluster groups for many plots like bar, box, scatter, series, highlow, and more.

• Box plot with cluster groups and interval axes.

• Discrete attribute maps.

• SG annotation for SG procedures.

With the availability of the features listed above, it became a lot easier to build most clinical graphs using the SAS 9.3 SGPLOT procedure, as shown in this chapter. This chapter will show you how to build such commonly used clinical graphs and will provide you the code that you can use directly for such graphs. Through the use of these examples, you will gain valuable insight on how to combine the plot statements to create your graph, and how to use the SG Annotate facility to fully customize the graph.

My effort is always to find ways to create the graph by layering multiple plot types to get the job done. This process is scalable and translates easily to other graphs. However, often there are cases where we have to use annotation to get just the right customization of the graph. In such cases, we will use the new SG Annotate facility.

3.1 Box Plot of QTc Change from Baseline

This graph displays the distribution of QTc change from baseline by week and treatment for all subjects in a study. The x-axis has a linear scale and is not discrete. The "Subjects At-Risk" values are displayed by treatment at the right location along the time axis.

3.1.1 Box Plot of QTc Change from Baseline with Outer Risk Table

For the graph in Figure 3.1.1.1, we will use a box plot to display the distribution of QTc change from baseline by week and treatment on a linear x-axis. The "Subjects At-Risk" table at the bottom has been added by using the annotation functions as defined in the "AnnoOut" data set.

Figure 3.1.1.1 – Box Plot of QTc Change from Baseline

title 'QTc Change from Baseline by Week and Treatment';

proc sgplot data=QTcData sganno=AnnoOut pad=(bottom=15pct);

format week qtcweek.;

vbox qtc / category=week group=drug groupdisplay=cluster nofill;

refline 26 / axis=x;

refline 0 30 60 / axis=y lineattrs=(pattern=shortdash);

xaxis type=linear values=(1 2 4 8 12 16 20 24 28) max=29

valueshint display=(nolabel);

yaxis label='QTc change from baseline' values=(-120 to 90 by 30);

keylegend / title='';

run;

The main body of the graph including the boxes by treatment, axes, title, and legend are created by the statements in the procedure above. Normally, a box plot treats the category variable as discrete, which would have placed all the tick values on the x-axis at equally spaced intervals. However, in this case the values on the x-axis represent days from start of study, and we want to place the data at the correctly scaled distance along the x-axis. This can be done be explicitly setting TYPE=LINEAR on the x-axis. Now, each box is placed at the scaled location along the x-axis.

The box plot is classified by treatment, which has two values "Drug A" and "Drug B". The boxes are sized by the smallest interval along the x-axis. In this case it is 1 day at the start of the study. The effective midpoint spacing is set by that interval, and all boxes are drawn to fit in this space.

The box plot uses GROUPDISPLAY=CLUSTER to place the treatment groups side by side. We can use the NOFILL option to draw empty boxes.

Note the use of SGANNO and PAD options on the procedure statement.

proc sgplot data=QTcData sganno=AnnoOut pad=(bottom=15pct);

The PAD option instructs the procedure to leave a blank pad of 15% of the height of the graph at the bottom. We will display the "Subjects At-Risk" values in this space using the "AnnoOut" data set provided with the SGANNO option. If we run the procedure without the SGANNO option, we will get the graph shown in Figure 3.1.1.2 below, with the extra empty space at the bottom (indicated by the arrow).

Figure 3.1.1.2 – Graph without the SGANNO Option

The risk table values, labels, and the header will be drawn using annotation statements defined in the annotation data set "AnnoOut". This data set is provided to the procedure using the SGANNO option. The data set contains a number of observations, each of which defines one item to be drawn. The data set contains predefined column names that determine what is to be drawn, where, and how as shown in Figure 3.1.1.3. Please see Section 2.9 for an introduction to SG annotation.

The data set "AnnoOut" has 22 observations, the first five of which are shown below. Every SGAnno data set must have a column called "Function". This column determines what is to be drawn. Multiple types of items can be drawn, including LINE, OVAL, and TEXT. In this case the risk values are displayed using the TEXT function.

Figure 3.1.1.3 – Annotation Instructions in the Data Set

Additional information might be required to draw the annotation function that is provided in other columns with predefined names as needed by each function. Many of these have default values, so if they are not provided, a system default is used. For drawing text, we need at least the text string in the column named "Label". The X and Y location and the DrawSpace setting may also be required. These are explained below.

The "Function" column contains the name of the element to be drawn; in this case, it is "Text". The text function requires that we provide the text string to be displayed in the column called "Label". Other optional information such as the position of the string and its visual attributes can also be provided.

1. X1 and Y1 provide the coordinates of the location where the text is drawn.

2. The DrawSpace determines how to interpret the X1 and Y1 values provided. DrawSpace applies to all coordinates and provides the "Context" and the "Units". In this case, we want different ways to interpret X1 and Y1, so DrawSpace is not used.

3. X1Space determines the interpretation of the x1 coordinate. In this case, X1Space="DataValue". This means, the x location is in reference to the lower left corner of the "Data" space, and the units are "Value", the actual data values. So, the first text will be drawn at value of "1" along the x-axis.

4. Y1Space determines the interpretation of the y1 coordinate. Here, Y1Space="GraphPercent". This means, the y location is with reference to the lower left corner of the whole graph, and the units are "Percent". So, the first text will be drawn at 11% above the bottom of the graph.

5. All the text attributes are determined by default, but some have been set using TextSize, TextWeight, and TextColor columns.

For the first observation, Function="Text", with X1=1 and Y1=11 and Label="275". Because X1Space is "DataValue", the x location of the label "275" is aligned with "1" on the x-axis. Because Y1Space is "GraphPercent", the y location of the label is at 11% from the bottom of the graph. The label is drawn with size=5, bold weight, and color of "GraphData1:ContrastColor" to match the color for "Drug A" in the data. The risk value for "Drug B" is drawn by the second observation, using the second group color at 8% from the bottom. This process repeats for all observations in the "AnnoOut" data set.

In addition to the risk values, we need to draw additional items to complete the table. These items are defined in rows 19-22 of the "AnnoOut" data set as shown in Figure 3.1.1.4 below.

Figure 3.1.1.4 – Annotation Data Set

The first row of risk values is for "Drug A", so we want to place the string "Drug A" to the left of the Y-axis in line with this row. We do that using the observation # 19 in the table. The Label is "Drug A", placed at X1= -1 (WallPercent), Y1=11% (GraphPercent). The string is anchored on the right, so the string is displayed to the left of the left edge of the wall. Note, the value for x1 is negative.

Even though the x-axis coordinate is to the left of the wall area, the string is still drawn. The DrawSpace determines only the interpretation of the data, but does not clip the graphics to this area. This allows the string to be positioned by the data value in one dimension and still be drawn outside the data area in the other dimension.

Similarly, we can display the "Subjects At-Risk" header above the risk values. Finally, we can display the footnote at the bottom of the graph. For full details, see Program 3_1, available from the author’s page at http://support.sas.com/matange.

3.1.2 Box Plot of QTc Change from Baseline with Inner Risk Table

Graphs are easier to decode when relevant information is placed as close as possible, thus reducing the amount of eye movement needed to decode the graph. Following this principle, it would be more effective to place the risk information inside the graph area, closer to the graphical information.

Figure 3.1.2.1 – Box Plot of QTc Change from Baseline with Inner Risk Table

title 'QTc Change from Baseline by Week and Treatment';

footnote j=l "Note: Increase < 30 msec 'Normal', ..";

proc sgplot data=QTcData sganno=annoIn;

format week qtcweek.;

vbox qtc / category=week group=drug groupdisplay=cluster

nofill name='a';

refline 26 / axis=x;

refline -120 / axis=y;

xaxis type=linear values=(1 2 4 8 12 16 20 24 28) valueshint

min=1 max=29 display=(nolabel);

yaxis label='QTc change from baseline' values=(-120 to 90 by 30)

offsetmin=0.14;

keylegend 'a' / title='Treatment:';

run;

This graph shows an alternate layout of the table. Traditionally, the "At-Risk" table is displayed at the bottom of the graph, just above the footnote with other items in between. Such a layout places the risk data relatively far away from the plot. Even though the values are aligned with the data along the x-axis, the distance and intervening items like the legend and the axis items create a distraction.

The code for creating the box plots of QTc change by treatment and week is the same as before. We can use the VBOX statement to draw the QTc values by drug and week. The XAXIS TYPE=Linear statement is used to display the x values scaled along the x-axis. REFLINE statements are used to draw the reference lines. Options are set on the XAXIS and YAXIS statements to get the desired results.

Now, we want to place the At-Risk table above the x-axis, along with the table header as shown below, marked by the arrow. Note, this brings the table of Subjects At-Risk closer to the data, and above the x-axis and the other items such as the legend.

Figure 3.1.2.2 – Subjects At-Risk Table Placed above the X-Axis

First we need to create some empty space above the x-axis line to draw the risk table. To do this, we will add an offset to the lower end of the Y-axis using the OFFSETMIN=0.14. In this case, there is no need to specify a padding at the bottom of the graph. Also, a regular footnote can be used to display the "Note" as we are not disturbing that part of the graph.

To correctly position the values along the x-axis in the right location, we will still use the X1SPACE of "DataValue". To place the values above the x-axis, we will use Y1Space of "WallPercent".

The first few observations in the data set are used for drawing the risk values as shown in Figure 3.1.2.3 below:

Figure 3.1.2.3 – Annotation Data Set for the Values

The last few observations in the data set are shown in Figure 3.1.2.4 below. These draw the labels for each drug and for the table title. Note the usage of the "Anchor" value for drawing the text strings. The default anchor value is "Center". See Program 3_1 for the full code.

Figure 3.1.2.4 – Annotation Data Set for the Labels

3.1.3 Box Plot of QTc Change from Baseline in Grayscale

The graph shown in Figure 3.1.3.1 below is customized for grayscale medium. We start with a graph that uses the Journal style, using line patterns for the two groups. Then, we customize the graph using annotate for a different look.

Figure 3.1.3.1 – Box Plot of QTc Change from Baseline in Grayscale

ods listing style=journal;

title 'QTc Change from Baseline by Week and Treatment';

footnote j=l "Note: Increase < 30 msec 'Normal', ..";

proc sgplot data=QTcData sganno=annoIn;

format week qtcweek.;

vbox qtc / category=week group=drug groupdisplay=cluster

nofill name='a';

refline 26 / axis=x;

refline -120 / axis=y;

xaxis type=linear values=(1 2 4 8 12 16 20 24 28) valueshint

min=1 max=29 display=(nolabel);

yaxis label='QTc change from baseline' values=(-120 to 90 by 30)

offsetmin=0.14;

keylegend 'a' / title='Treatment:';

run;

The code above is identical to the code in Section 3.1.2, except we have used STYLE=Journal in the ODS LISTING statement. The Journal style is a non-color, black and white style. Since we are using unfilled boxes, the boxes for Drug A and B are both shown using black lines, one solid for Drug A and one dashed for Drug B.

Now, often dashed lines do not result in a clean representation of a box, so it might be desirable to make the lines solid for all treatments. In that case, the legend will show solid lines for both Drug A and Drug B, which is not a desirable result either.

In this case, we will suppress the automatic legend, and draw our own legend as shown in Figure 3.1.3.3 using annotation.

Figure 3.1.3.3 – QTc Change from Baseline Graph in Grayscale

The process for creating the legend is shown below. The data set follows.

1. Create empty space above the footnote using an extra empty Footnote2 statement.

2. Add annotation functions to draw the box in the center of the wall below the axis.

3. Draw the Legend title "Treatment" on the left side.

4. To the right of this, draw the "oval", with the text "Drug A" beside it.

5. To the right of this, draw the "plus", with the text "Drug B" beside it.

Figure 3.1.3.4 – Annotation Data Set

As you can see, creating such annotated details is possible, but requires custom coding that is not scalable to other data. This is not really a solution I would suggest but I provide it mainly to provide an example of things that can be done using annotation. See the full code in Program 3_1.

3.2 Mean Change in QTc by Week and Treatment

This graph displays the mean and confidence of change in QTc from baseline by week and treatment for subjects in a study. The x-axis displays weeks on a linear scale with clustered groups.

3.2.1 Mean Change of QTc by Week and Treatment with Outer Table

The mean values are displayed by week and treatment. The table of subjects at visit is displayed at the bottom of the graph, below the axis in the traditional arrangement.

Figure 3.2.1.1 – Mean Change of QTc by Week and Treatment with Outer Table

title 'Mean Change of QTc by Week and Treatment';

proc sgplot data=QTc_Mean_Group sganno=annoOuter pad=(bottom=14%);

format week qtcmean.;

scatter x=week y=mean / yerrorupper=high yerrorlower=low group=drug

groupdisplay=cluster clusterwidth=0.5

markerattrs=(size=7 symbol=circlefilled);

series x=week y=mean2 / group=drug groupdisplay=cluster

clusterwidth=0.5 lineattrs=(pattern=solid);

refline 26 / axis=x;

refline 0 / axis=y lineattrs=(pattern=shortdash);

xaxis type=linear values=(0 1 2 4 8 12 16 20 24 28) max=29

valueshint display=(nolabel);

yaxis label='Mean change (msec)' values=(-6 to 3 by 1);

run;

The graph uses a SERIES plot statement of "Mean" by "Week" classified by "Drug" to display the curves. It is possible to display the markers as part of the SERIES plot statement, but not the error bars. Hence, we have used an overlay of the SCATTER plot statement to display filled markers and also the error bars for upper and lower confidence limits.

The group values are displayed side-by-side for both the scatter and series plots, using cluster groups with CLUSTERWIDTH=0.5. This ensures tight clustering, and also ensures that the markers for each drug are connected to the line. A reference line is drawn at X=26 to separate the LOCF value and another at Y=0 to display the baseline. The x-axis is set to linear, and specific tick values are requested as shown in the code. A user-defined format is used to display the value '28' as "LOCF".

The values for LOCF are not connected to the other values, so we have used the "Mean2" variable for the series plot, where these values are missing. The HTMLBlue style is used for the graph that has an attribute priority of "color". So, we get color changes only for the two group values. Both groups use filled circle markers and solid line patterns.

The graph includes the display of the "Number of Subjects at Visit" by week and treatment at the bottom of the display. This information is added to the graph using annotation. The procedure statement includes the options SGANNO and PAD.

Using the PAD option, we have reserved 14% of the height of the graph at the bottom as blank space. We will draw the At-Risk table in this space using annotation commands from the "annoOuter" data set. This will include the table header and the footnote using annotation.

proc sgplot data=QTc_Mean_Group sganno=annoOuter pad=(bottom=14%);

The "annoOuter" data set contains all the annotation commands needed to draw the header for the table, the values for the number of subjects in the study by treatment and week, and the footnote at the bottom. The first 20 observations in the data set draw the number of subjects in the study by treatment and week. The next 4 observations draw the table header, the treatment labels, and the footnote. See Section 2.9 for an introduction to annotation functions and data set.

Figure 3.2.1.2 – Annotation Data Set for Values

The values are drawn using X1Space='DataValue' to ensure that the values are correctly positioned along the x-axis with Anchor='Center' and Y1Space='GraphPercent'. The Drug A values are drawn at 9% from bottom using Graphdata1:contrastcolor, and Drug B at 6%, using Graphdata2:contrastcolor to match the colors used for the same groups in the graph.

When all the subject count values are displayed correctly, we now need to draw the labels for Drug A and Drug B to the left of the data along each row. We also need to display the table header "Number of subjects at visit" and the footnote. These annotation commands are in the last four observations in the data set, as shown below.

Figure 3.2.1.3 – Annotation Data Set for Labels

The annotations from the "annoOuter" data set are drawn into the graph as shown below. See Program 3_2 for the full code.

Figure 3.2.1.4 – Subjects at Visit Table Added Using Annotation

3.2.2 Mean Change of QTc by Week and Treatment with Inner Table

The mean values are displayed by week and treatment. The table of subjects at visit is displayed above the x-axis by using a SCATTER plot statement with MARKERCHAR on the Y2-axis. We did not use annotation for this graph. This makes it easier to understand the numbers as they are closer to the rest of the graph.

Figure 3.2.2.1 – Mean Change of QTc by Week and Treatment with Inner Table

This graph shows the mean change of QTc by week and treatment, similar to the graph shown in Section 3.2.1. The main difference here is that we have positioned the "Number of subjects" table above the x-axis, closer to the graphical data. This has the benefit of proximity to the data and reduction of clutter in the graph. The other important benefit is that we do not need any annotation to create this graph.

title 'Mean Change of QTc by Week and Treatment';

footnote j=l h=0.8 'Note: Vertical lines represent 95% confidence intervals. LOCF is last observation carried forward';

proc sgplot data=QTc_Mean_Group_Header;

format week qtcmean. N 3.0;

scatter x=week y=mean / yerrorupper=high yerrorlower=low

group=drug name='a' nomissinggroup groupdisplay=cluster

clusterwidth=0.5 markerattrs=(size=7 symbol=circlefilled);

series x=week y=mean2 / group=drug groupdisplay=cluster

clusterwidth=0.5;

refline 26 / axis=x;

refline 0 / axis=y lineattrs=(pattern=shortdash);

xaxis type=linear values=(0 1 2 4 8 12 16 20 24 28)

max=29 valueshint display=(nolabel);

yaxis label='Mean change (msec)' values=(-6 to 3 by 1)

offsetmin=0.14 offsetmax=0.02;

keylegend 'a' / location=inside position=top;

run;

The code shown above draws the graph shown in Figure 3.2.2.2 shown below without the "Number of Subjects" table just above the x-axis. Note, the extra blank space above the x-axis. This space is created by the YAXIS option OFFSETMIN=0.14, which leaves some blank space at the bottom.

Figure 3.2.2.2 – Creating Space for the Inner Table

Now, we added the HIGHLOW, SCATTER with MARKERCHAR and theY2AXIS statements shown below to populate this space with the "Number of Subjects" data. Some of the options on the other statements are "thinned" to fit the code in the box.

proc sgplot data=QTc_Mean_Group_Header;

format week qtcmean. N 3.0;

scatter x=week y=mean / yerrorupper=high yerrorlower=low

group=drug <opts>;

series x=week y=mean2 / group=drug groupdisplay=cluster <opts>;

highlow y=ylbl low=xlbl high=xlbl / highlabel=label y2axis

lineattrs=(thickness=0);

scatter x=week y=drug / markerchar=n group=drug

markercharattrs=(size=5 weight=bold) y2axis nomissinggroup;

refline -6 / axis=y;

refline 26 / axis=x;

refline 0 / axis=y lineattrs=(pattern=shortdash);

xaxis type=linear values=(0 1 2 4 8 12 16 20 24 28) max=29 <opts>;

yaxis label='Mean change (msec)' values=(-6 to 3 by 1)

offsetmin=0.14 offsetmax=0.02;

y2axis offsetmin=0.88 offsetmax=0.03 display=none reverse;

keylegend 'a' / location=inside position=top;

run;

First, we have three new variables in the data: "xlbl", "ylbl", and "label". These are all missing except for three observations. This is used with the highlow plot HIGHLABEL option to display the table header in the location set by "xlbl" and "ylbl".

Next, we use the second SCATTER plot statement to display the "N" values by week and treatment using the Y2AXIS. We set the extents of the y-axis so that all the other plots are drawn in the upper part of the graph. We set the extents of the Y2-axis so that the related data is drawn in the lower part of the graph.

We draw a reference line at Y= -6 as a separator. See Program 3_2 for the full code.

3.2.3 Mean Change in QTc by Visit in Grayscale

Figure 3.2.3.1 shows the Mean Change in QTc graph in a grayscale format. Here we have derived a custom style from the Journal style using the %MODSTYLE macro.

Figure 3.2.3.1 – Mean Change in QTc by Visit in Grayscale

The new style is derived from the Journal with filled markers. The graph above is created using the same code except we now have lost the color hint for the classifiers for the values in the table. To fix that, we have added the display "A" and "B" labels on the left of the rows of the subjects table using the highlow plot option LOWLABEL. See Program 3_2 for the full code.

%modstyle(name=markers, parent=journal, type=CLM,

markers=circlefilled trianglefilled);

ods listing style=markers;

proc sgplot data=QTc_Mean_Group_Header;

format week qtcmean. N 3.0;

scatter x=week y=mean / yerrorupper=high yerrorlower=low group=drug

name='a' nomissinggroup groupdisplay=cluster

series x=week y=mean2 / group=drug groupdisplay=cluster;

highlow y=ylbl low=xlbl high=xlbl / highlabel=highlabel

lowlabel=lowlabel y2axis lineattrs=(thickness=0);

scatter x=week y=drug / markerchar=n group=drug

markercharattrs=(size=5 weight=bold) y2axis;

refline 26 / axis=x;

refline -6 / axis=y;

refline 0 / axis=y lineattrs=(pattern=shortdash);

xaxis type=linear values=(0 1 2 4 8 12 16 20 24 28) max=29

valueshint display=(nolabel);

yaxis label='Mean change (msec)' values=(-6 to 3 by 1)

offsetmin=0.14 offsetmax=0.02;

y2axis offsetmin=0.88 offsetmax=0.03 display=none reverse;

keylegend 'a' / location=inside position=top;

run;

3.3 Distribution of ASAT by Time and Treatment

This graph consists of three sections. The main body of the graph contains the box plot of ASAT by Week and Treatment in the middle. A table of subjects in the study by treatment is at the bottom, and the number of subjects with value > 2 by treatment is at the top of the graph.

Figure 3.3.1 – Distribution of ASAT by Time and Treatment

title 'Distribution of ASAT by Time and Treatment';

proc sgplot data=asat sganno=annoSort;

vbox asat / category=week group=drug name='box' nofill;

refline 1 / lineattrs=(pattern=shortdash);

refline 2 / lineattrs=(pattern=dash);

refline 2.1 0.16;

refline 25 / axis=x;

xaxis type=linear values=(0 2 4 8 12 24 28) offsetmax=0.05

yaxis offsetmax=0.16 offsetmin=0.16;

keylegend 'box' / position=top;

run;

This graph is created using a VBOX of ASAT by Week and Drug. The XAXIS TYPE is set to Linear so that the week data is plotted to scale by value. A group display of Cluster is used to place the boxes and the outliers side by side.

Reference lines on the y-axis are placed at 1.0 and 2.0. Additional reference lines are placed to act as separators between the plot and the tabular data. The legend is placed at the top, under the title to reduce the clutter with the x-axis.

The tabular data at the bottom and top of the graph are inserted using annotation and included using SGANNO=annoSort proc option. The y-axis OFFSETMIN and OFFSETMAX are set to 0.16 or 16% of the height of the graph. This creates the blank space at the top and bottom of the wall.

Without annotation, the graph looks like the one shown in Figure 3.3.2 below. Note the blank space at the top and bottom of the wall as indicated by the arrows.

Figure 3.3.2 – Creating Space for the Table of Values

The SG Annotation data set is built from the "Asat" data set. The data set contains all the commands necessary to draw the values in the bottom and top regions of the graph and the labels for each row. Snippets of the data set are shown below. The block of observations with "Id" of "Count" is for display of the subject counts. This includes two observations to draw the labels shown in the second block below. Another similar block of observations with "Id" of "GT2" is for the values > 2 at the top of the graph.

Figure 3.3.3 – Annotation Data Set

Figure 3.3.4 – Annotation Data Set

The ID column with values of "Count" and "GT2" is added only for convenience of describing which observations are used for which part of the graph. These values play no part in the actual display of the annotations. See Program 3_3 for the full code.

3.4 Median of Lipid Profile by Visit and Treatment

This graph displays the median of the lipid values by visit and treatment. The visits are at regular intervals and represented as discrete data.

3.4.1 Median of Lipid Profile by Visit and Treatment on Discrete Axis

The values for each treatment are displayed along with the 95% confidence limits as adjacent groups using GROUPDISPLAY of "Cluster" and CLUSTERWIDTH=0.5. The HTMLBlue style is used.

Figure 3.4.1 – Median of Lipid Profile by Visit and Treatment

ods listing style=htmlblue;

title 'Median of Lipid Profile by Visit and Treatment';

proc sgplot data=lipid_grp;

series x=day y=median / group=trt groupdisplay=cluster

clusterwidth=0.5 lineattrs=(thickness=2);

scatter x=day y=median / yerrorlower=lcl yerrorupper=ucl

group=trt name='s' groupdisplay=cluster clusterwidth=0.5

errorbarattrs=(thickness=1)markerattrs=(symbol=circlefilled);

keylegend 's' / title='Treatment';

yaxis label='Median with 95% CL' grid;

xaxis display=(nolabel);

run;

A SCATTER statement is used with filled markers and error bars to display the data using cluster groups. The values across visits are joined using a series plot. Note, the series plot also uses cluster groups with the same cluster width. For full details, see Program 3_4.

3.4.2 Median of Lipid Profile by Visit and Treatment on Linear Axis in Grayscale

This graph displays the graph of the median of the lipid data by treatment in grayscale on a linear x-axis.

Figure 3.4.2 – Median of Lipid Profile by Visit and Treatment on Linear Axis

%modstyle(name=markers, parent=journal, type=CLM,

markers=circlefilled trianglefilled squarefilled

diamondfilled);

options debug=none;

ods listing style=markers;

title 'Median of Lipid Profile by Visit and Treatment';

proc sgplot data=lipid_Liner_grp;

series x=n y=median / group=trt

groupdisplay=cluster clusterwidth=0.5;

scatter x=n y=median / yerrorlower=lcl yerrorupper=ucl group=trt

groupdisplay=cluster clusterwidth=0.5

errorbarattrs=(thickness=1)markerattrs=(size=7) name='s';

keylegend 's' / title='Treatment';

yaxis label='Median with 95% CL' grid;

xaxis display=(nolabel) values=(1 4 8 12 16);

run;

The visits are not at regular intervals and are displayed at the correct scaled location along the x-axis. The visits are at week 1, 2, 4, 8, 12, and 16. These values are formatted to the strings shown on the axis. "Visit 1" collides with "Baseline", causing alternate tick values to be dropped, so I removed "1" from the tick value list.

As you can see, the group values are displayed as clusters, and the "effective midpoint spacing" is the shortest distance between the values. The markers are reduced in size to show the clustering. This can be adjusted by setting marker SIZE=7. The %MODSTYLE macro is used to derive a new style from Journal with filled markers. For full details, see Program 3_4.

3.5 Survival Plot

The survival plot is one of the most popular graphs that users want to customize to their own needs. Here I have run the LIFETEST procedure to generate the data for this graph. The output is saved into the "SurvivalPlotData" data set.

3.5.1 Survival Plot with External "Subjects At-Risk" Table

The survival plot shown below in Figure 3.5.1.1 has the traditional arrangement where the table of Subjects At-Risk is displayed at the bottom of the graph, below the x-axis.

Figure 3.5.1.1 – Survival Plot with External "Subjects At-Risk" Table

ods output Survivalplot=SurvivalPlotData;

proc lifetest data=sashelp.BMT

plots=survival(atrisk=0 to 2500 by 500);

time T * Status(0);

strata Group / test=logrank adjust=sidak;

run;

For more information about the LIFETEST procedure, see the SAS/STAT documentation.

A step plot of survival by time by strata is used to display the survival curves. A scatter overlay is used to draw the censored values. A legend is drawn in the top right corner to show the censored observations, and a legend is drawn at the bottom to indicate the stratum.

title 'Product-Limit Survival Estimates';

title2 h=0.8 'With Number of AML Subjects at Risk';

proc sgplot data=SurvivalPlotData sganno=anno

pad=(bottom=15pct left=6pct);

step x=time y=survival / group=stratum

lineattrs=(pattern=solid) name='s';

scatter x=time y=censored / markerattrs=(symbol=plus) name='c';

scatter x=time y=censored / markerattrs=(symbol=plus) GROUP=stratum;

keylegend 'c' / location=inside position=topright;

keylegend 's';

run;

We have set the bottom pad=15%. This reserves space of 15% of the graph height at the bottom of the graph as indicated by the arrow below. We will use annotation to draw the "Subjects At-Risk" table in this space using SGANNO=anno procedure option.

Figure 3.5.1.2 – Survival Plot with Space for the Table of "Subjects At-Risk"

To write the table values at the bottom of the graph, we have to create an SGAnno data set. This data set contains all the commands needed to draw the values along the axis and the labels on the left. Each observation in the data set provides the information needed to draw each value using the predefined column names. The first 18 observations provide the data for the values in the table as shown below.

A row of values is drawn for each stratum level, at a fixed Y distance from the bottom of the graph. Each row also uses the text color to match the values in the graph. The values are positioned along the x-axis using X1Space=DataValue and Y1Space=GraphPercent.

Figure 3.5.1.3 – Annotate Data Set Observations for Drawing the Table of Subjects At-Risk

The last three observations in the data set provide the information to draw the stratum labels for each row of values. The labels are drawn using X1Space=WallPercent and X1= -1 with labels anchored on the right. The annotated part of the graph is shown below. For full details, see Program 3_5.

Figure 3.5.1.4 – Annotate Data Set Observations for Labels

Figure 3.5.1.5 – Table of Subjects At-Risk at Bottom of Graph

3.5.2 Survival Plot with Internal "Subjects At-Risk" Table

The graph shown here is mostly similar to the graph in Section 3.5.1, with the difference that the "Subjects At-Risk" table in moved above the x-axis, close to the rest of the data. Bringing all the data closer makes it easy to align the values with the data, and that improves the effectiveness of the graph.

Figure 3.5.2 – Survival Plot with the Table of Subjects At-Risk above the X-Axis

title 'Product-Limit Survival Estimates';

title2 h=0.8 'With Number of AML Subjects at Risk';

proc sgplot data=SurvivalPlotData sganno=anno_in pad=(left=6pct);

step x=time y=survival / group=stratum

lineattrs=(pattern=solid) name='s';

scatter x=time y=censored / markerattrs=(symbol=plus) name='c';

scatter x=time y=censored / markerattrs=(symbol=plus) GROUP=stratum;

refline 0.2;

yaxis offsetmin=0.2;

keylegend 'c' / location=inside position=topright;

keylegend 's';

run;

Here we use YAXIS OffsetMin=0.2 to create some space at the bottom of the y-axis. Then, we change the Y1Space from GraphPercent to WallPercent to draw the values above the wall. A reference line at y=0.2 acts as a separator.

Relevant details are shown in the code snippet above. For full details, see Program 3_5.

3.5.3 Survival Plot with Internal "Subjects At-Risk" Table in Grayscale

This graph shows the same survival plot using a grayscale medium.

Figure 3.5.3 – Survival Plot in Grayscale

ods listing style=journal;

title 'Product-Limit Survival Estimates';

title2 h=0.8 'With Number of AML Subjects at Risk';

proc sgplot data=SurvivalPlotData sganno=anno_in pad=(left=6pct);

step x=time y=survival / group=stratumnum lineattrs=(pattern=solid)

curvelabel name='s';

scatter x=time y=censored / name='c'

markerattrs=(symbol=circlefilled size=4);

keylegend 'c' / location=inside position=top;

inset ("1:"="ALL" "2:"="AML-High Risk" "3:"="AML-Low Risk") / border;

refline 0.2;

yaxis offsetmin=0.2;

run;

Displaying the survival plot in a grayscale medium presents some challenges. Here we cannot use colors to identify the strata. Normally, the Journal style uses line patterns to identify the groups. Although line patterns work well for curves, they are not so effective with step plots because of the frequent breaks. So, it is preferable to use solid lines for all the levels of the step plot and use labels to identify the strata.

We would like to use curve labels to identify curves by stratum. However, curve labels can get long. So, we plot the curves by "stratumnum" to identify each curve, and add an inset to identify each ID. The same annotation table as in 3.5.3 is used to add the table above the x-axis.

Relevant details are shown in the code snippet above. For full details, see Program 3_5.

3.6 Simple Forest Plot

A forest plot is a graphical representation of a meta-analysis of the results of randomized controlled trials.

3.6.1 Simple Forest Plot

The graph in Figure 3.6.1 shows a simple forest plot, with the odds ratio plot in the middle by study names on the y-axis along with the tabular display of the statistics on the right.

Figure 3.6.1.1 – Simple Forest Plot

title "Impact of Treatment on Mortality by Study";

title2 h=8pt 'Odds Ratio and 95% CL';

proc sgplot data=forest noautolegend;

refline study_lbl / transparency=0.95

lineattrs=(thickness=13px color=darkgreen);

scatter y=study x=or / xerrorupper=ucl xerrorlower=lcl group=grp;

refline 1 100 / axis=x noclip;

refline 0.01 0.1 10 / axis=x lineattrs=(pattern=shortdash)

transparency=0.5 noclip;

scatter y=study x=xlbl / markerchar=lbl ;

scatter y=study x=or_lbl / markerchar=or x2axis

markercharattrs=(size=6);

scatter y=study x=lcl_lbl / markerchar=lcl x2axis

markercharattrs=(size=6);

scatter y=study x=ucl_lbl / markerchar=ucl x2axis

markercharattrs=(size=6);

scatter y=study x=wt_lbl / markerchar=wt x2axis

markercharattrs=(size=6);

xaxis type=log max=100 minor display=(nolabel) valueattrs=(size=7)

offsetmin=0.05 offsetmax=0.3;

yaxis display=(noticks nolabel) valueattrs=(size=7) reverse;

x2axis display=(noticks nolabel) valueattrs=(size=7)

offsetmin=0.75 offsetmax=0.05;

run;

The data for this graph contains the odds ratio, the confidence limits, and the weight for each study. The studies are classified with '1' for individual study names and '2' for "Overall". We use this information to plot the graph by study using a scatter plot with error bars to plot the odds ratio on a log axis.

The %MODSTYLE macro is used to derive a new style named 'Forest' with parent='Analysis', and using two markers for group classification, "SquareFilled" and "DiamondFilled".

Let us see how we have built this graph.

1. The data contains observations for each study including "Overall".

2. Two additional observations are added for the labels at the bottom.

3. Each observation has "Grp" of 1 if it is a study, and 2 otherwise.

4. A SCATTER statement is used to draw the odds ratio and error bars in the left side of the graph.

5. This is done by setting the OFFSETMAX for the x-axis to 0.3.

6. Reference lines are drawn at 1, 100, 0.01, 0.1, and 1 on a log axis.

title "Impact of Treatment on Mortality by Study";

title2 h=8pt 'Odds Ratio and 95% CL';

proc sgplot data=forest noautolegend;

scatter y=study x=or / xerrorupper=ucl xerrorlower=lcl group=grp;

refline 1 100 / axis=x noclip;

refline 0.01 0.1 10 / axis=x lineattrs=(pattern=shortdash) noclip;

scatter y=study x=xlbl / markerchar=lbl;

xaxis type=log max=100 minor display=(nolabel)

offsetmin=0.05 offsetmax=0.3;

yaxis display=(noticks nolabel) reverse;

run;

Figure 3.6.1.2 – Creating Space for the Statistics

The code and the graph are shown above. The right 30% of the graph is empty, and we will use additional statements to populate this space (marked by an arrow) with the data columns.

To populate the data columns on the right of the graph as shown at the top of this section, we will use the SCATTER statement with the MARKERCHAR option as follows:

scatter y=study x=or_lbl / markerchar=or;

To make this work, we have added four columns to the data set as shown below.

Figure 3.6.1.3 – Data for the Graph

Note the columns "or_lbl", "lcl_lbl", "ucl_lbl", and "wt_lbl". Each of these columns has the same value for all the observations, and it is the text string for the description of the value. We use these columns as the X role for the scatter plot, and the corresponding value for the MARKERCHAR role.

This places the value in the "or_lbl" column as the x-axis tick value "OR", and place the value of the "OR" column as the value in each row of this column. We use four such SCATTER plot statements, one for each of OR, LCL, UCL, and Wt to place all four columns of data on the right side.

The values go on the right side because we associate all these statements with the X2AXIS, and we set the OFFSETMIN of the X2AXIS to 0.75. So, all these values are placed in the right 25% of the graph.

Note in the data shown above, we have a column "Study_Lbl". This column has a copy of the "Study" column for every other observation. We use this variable with the REFLINE statement to draw the alternating bands for ease of reading the graph. For full details, see Program 3_6.

Figure 3.6.1.4 – Forest Plot

3.6.2 Simple Forest Plot with Study Weights

The graph below shows a simple forest plot with Study Weights, where the markers in the odds ratio plot are sized by the weight of the study.

Figure 3.6.2 – Simple Forest Plot with Study Weights

title "Impact of Treatment on Mortality by Study";

title2 h=8pt 'Odds Ratio and 95% CL';

proc sgplot data=forest noautolegend nocycleattrs;

refline study_lbl / transparency=0.95 lineattrs=(color=darkgreen);

scatter y=study x=or2 / markerattrs=graphdata2(symbol=diamondfilled);

highlow y=study low=lcl high=ucl / type=line lineattrs=(pattern=solid);

highlow y=study low=q1 high=q3 / type=bar barwidth=0.6

fillattrs=graphdata1;

refline 1 100 / axis=x noclip;

refline 0.01 0.1 10 / axis=x transparency=0.5 noclip;

scatter y=study x=xlbl / markerchar=lbl ;

scatter y=study x=or_lbl / markerchar=or x2axis

markercharattrs=(size=6);

scatter y=study x=lcl_lbl / markerchar=lcl x2axis

markercharattrs=(size=6);

scatter y=study x=ucl_lbl / markerchar=ucl x2axis

markercharattrs=(size=6);

scatter y=study x=wt_lbl / markerchar=wt x2axis

markercharattrs=(size=6);

xaxis type=log max=100 minor display=(nolabel)

offsetmin=0.05 offsetmax=0.3;

yaxis display=(noticks nolabel) reverse;

x2axis display=(noticks nolabel) offsetmin=0.75 offsetmax=0.05;

run;

This graph uses a highlow plot to display the relative weights for each study and the confidence interval. The scatter plot uses the "OR2" variable, which is non-missing only for the "Overall" study. So, only the diamond marker is drawn by the scatter plot.

Each box is weighted by the "Wt" column. Low and high values are computed for the box and displayed using the highlow plot. This allows for a qualitative comparison between the weights, which might not be displayed as exactly proportional because of the use of the log x-axis. Some options are trimmed in the code above to fit the space. For full details, see Program 3_6.

3.6.3 Simple Forest Plot with Study Weights in Grayscale

Figure 3.6.3 shows the same forest plot in grayscale medium.

Figure 3.6.3 – Forest Plot in Grayscale

ods listing style=journal;

title "Impact of Treatment on Mortality by Study";

title2 h=8pt 'Odds Ratio and 95% CL';

proc sgplot data=forest noautolegend nocycleattrs;

refline study_lbl / transparency=0.95 lineattrs=(thickness=13px);

scatter y=study x=or2 /

markerattrs=graphdata2(symbol=diamondfilled);

highlow y=study low=lcl high=ucl / type=line

lineattrs=(pattern=solid);

highlow y=study low=q1 high=q3 / type=bar barwidth=0.6;

refline 1 100 / axis=x noclip;

refline 0.01 0.1 10 / axis=x transparency=0.5 noclip;

scatter y=study x=xlbl / markerchar=lbl ;

scatter y=study x=or_lbl / markerchar=or x2axis;

scatter y=study x=lcl_lbl / markerchar=lcl x2axis;

scatter y=study x=ucl_lbl / markerchar=ucl x2axis;

scatter y=study x=wt_lbl / markerchar=wt x2axis;

xaxis type=log max=100 minor display=(nolabel)

offsetmin=0.05 offsetmax=0.3;

yaxis display=(noticks nolabel) reverse;

x2axis display=(noticks nolabel) offsetmin=0.75 offsetmax=0.05;

run;

Rendering this graph in a grayscale medium does not pose a lot of challenges. Basically, we have set the ODS style to JOURNAL to produce the graph above. This is structurally similar to the graph shown in Section 3.6.2.

Some options are trimmed in the code above to fit the space. For full details, see Program 3_6.

3.7 Subgrouped Forest Plot

More recently, many of you have been asking for building a forest plot, where each study value has multiple subgroups. The example in Figure 3.7.1 shows two subgroups per observation.

For the graph shown below, only the hazard ratio plot in the middle is displayed using a plot statement. The rest of the tabular data is displayed using annotation.

Figure 3.7.1 – Subgrouped Forest Plot

proc sgplot data=Forest2 nocycleattrs noautolegend

sganno=anno pad=(top=6pct);

refline ref / lineattrs=(thickness=15 color=cxf0f0f0);

highlow y=obsid low=low high=high;

scatter y=obsid x=mean / markerattrs=(symbol=squarefilled);

scatter y=obsid x=mean / markerattrs=(size=0) x2axis;

refline 1 / axis=x;

refline &Rows / noclip;

scatter y=yl x=xl / markerchar=text;

yaxis reverse offsetmax=0 offsetmin=0 display=none;

xaxis display=(noline nolabel) values=(0.0 0.5 1.0 1.5 2.0 2.5)

offsetmin=0.4 offsetmax=0.25;

x2axis display=(noline noticks novalues) offsetmin=0.4 offsetmax=0.25

label=' Hazard Ratio' ;

run;

This graph displays the hazard ratio and confidence limits by subgroup, along with the number of patients in the study and other statistics. The key difference here is the display of the sub groups and values in the first column. The subgroup titles are displayed in a bold font, and the values are displayed in the normal font, indented to the right.

The hazard ratio plot in the middle is displayed using HIGHLOW and SCATTER plot statements. "PCI Better" and "Therapy Better" insets are displayed using SCATTER with MARKERCHAR. All the rest of the information in the graph is populated using annotation. Let us see how that was done.

The annotation data set is derived from the original forest data as shown below. In this data set, I have added two columns just for explanation of the items. These are the "Anno" and the "AnnoType" columns. The actual data has 95 observations, made up of the "Text" functions to draw each of the column values and the headers. I have used the "Anno" column to identify the column and what is in the column – "AnnoType". Then I sorted on "Anno" and kept only three observations, each to derive a reduced data set shown below.

Figure 3.7.2 – Annotation Data Set

The first set of annotations is for the "Subgroup" and contains labels like "Overall", "Age", and "<= 65 yr". Remember, there are actually more observations for subgroups. All text annotations for the first column use x1Space="WallPercent" and y1Space='DataValue" and anchor='Left'. Recall that we have reserved the lower 40% of axis space for the left columns and 25% of upper axis space for the upper columns on the right using axis offsets. Annotated values are aligned along Y with "ObsId" values.

Subgroup headers like "Age" (id=1) are positioned at x=2%, and have bold weight with size=8pt. Values that are not header values (id=2) are indented to start at x=4%, with normal weight and with font size=6pt. We have similarly plotted the values for the CountPct values and the three columns on the right.

The last few observations are for the column headers. Here x1Space and y1Space are both 'WallPercent' as we want to draw these labels above the wall at known percent coordinates. Text is anchored at bottom or BottomLeft as necessary, and lined up with the X2-axis label "Hazard Ratio". See Program 3_7 for all the details.

3.8 Adverse Event Timeline by Severity

An Adverse Event Timeline graph by AEDECOD and Severity is useful to view the history of a specific subject in a study.

Figure 3.8.1 – Adverse Event Timeline by AEDECOD and Severity

title "Adverse Event Timeline Graph for Patient Id = &pid";

proc sgplot data=ae2_stday dattrmap=attrmap;

format stdate date7.;

refline 0 / axis=x lineattrs=(color=black);

highlow y=aedecod low=stday high=enday / type=bar group=aesev

lineattrs=(color=black pattern=solid) barwidth=0.8

lowlabel=aedecod highcap=highcap attrid=Severity

nomissinggroup labelattrs=(color=black size=7);

scatter y=aedecod x=stdate / x2axis markerattrs=(size=0);

xaxis grid display=(nolabel) valueattrs=(size=7)

values=(&minday to &maxday by 2) offsetmax=0.02 ;

x2axis display=(nolabel) type=time valueattrs=(size=7)

values=(&mindate to &maxdate) offsetmax=0.02;

yaxis reverse display=(noticks novalues nolabel);

run;

The graph above displays each adverse event as a bar segment over its duration. The color of the event is set by the severity. The source data is in CDISC, SDTM tabulation model format, as shown below.

Figure 3.8.2 – Data Set

The data has many columns, but the ones that we are using are aeseq, aedecod, aesev, aestdtc, and aeendtc. In the example above, all aestdtc values are present and assumed to be valid. If not, some data cleaning might be needed. In the DATA step, stdate is extracted from aestdtc and endate from aeendtc. If aeendtc is missing, the largest value of endate is used, and highcap is set to "FilledArrow" to indicate the event does not have an end date. A valid end date is required to draw the event in the graph. The data set required for plotting the graph is shown below.

Figure 3.8.3 – Data Set with Added Columns

The data set above is computed for creating the graph. Note in this data set, we do not have any observations with Severity="Severe". However, the legend in the graph does have an entry for "Severe". These dummy observations do not have valid start and end values, so they are not actually drawn in the graph. The top x-axis is enabled by using a scatter plot assigned to the x2-axis. Macro variables are used to align the x- and x2-axes.

Normally, observations with specific group values are assigned the color and other attributes from the GraphData1-12 style elements. These are assigned in the order in which they are encountered in the data. In this case, we are using specific colors for "Mild", "Moderate", and "Severe". If we just change the colors of the style elements, we will get one of the three colors, but the color assignments can shift based on the order of the data.

To ensure consistent and reliable color assignment, we will use the Discrete Attribute Map data set. Colors and the specific values of the group values are explicitly assigned. Now, the group values will get the colors by value, and not based on the order of the values in the data. In this case, LineColor is used both for lines and text.

Figure 3.8.4 – Data Set for Attribute Map

Only the severity values that are present in the data are displayed in the legend. The color representing each severity is obtained from the attribute map. However, often we might want to display all the classification levels of severity that could occur, even if they might not be present in the current data set. To do this, we have to insert three observations at the start of the data, each having one of the three severity values, with missing values for other columns to prevent their inclusion in the graph.

The highlow plot is ideally suited for such a use case, and provides support to draw labels and arrowhead caps at each end. In this case, the LOWLABEL option is used to draw the event names. We have displayed the aedecod label only the first time. The HIGHCAP option is used to draw the arrowhead as shown for "Cough" at the right end. This indicates an event that does not have an end date in the data.

For the grayscale use case, we can change the high-low bar type to the default "Line". This will allow use of the line pattern as the visual element for the different severity values. Here is the graph, along with the appropriate attribute map.

Figure 3.8.5 – Adverse Event Timeline Graph in Grayscale

Figure 3.8.6 – Attribute Map for Grayscale Graph

Relevant details are shown in the code snippet above. For full details, see Program 3_8.

3.9 Change in Tumor Size

The graph shown below is commonly known at a waterfall chart in the oncology domain. The graph displays the change in tumor size by treatment.

Figure 3.9.1 – Change in Tumor Size by Treatment

title 'Change in Tumor Size';

title2 'ITT Population';

proc sgplot data=TumorSizeSorted;

vbar cid / response=change group=group datalabel=label

groupdisplay=cluster clusterwidth=1;

refline 20 -30 / lineattrs=(pattern=shortdash);

xaxis display=none discreteorder=data;

yaxis values=(60 to -100 by -20);

inset ("C="="CR" "R="="PR" "S="="SD" "P="="PD" "N="="NE") / title='BCR'

position=bottomleft border textattrs=(size=6 weight=bold);

keylegend / title='' location=inside position=topright across=1 border;

run;

The graph displays percent change in tumor size in descending order of size increase for the population by treatment. Each bar represents one subject in the study. Values above zero indicate increase in tumor size, but values below zero indicate decrease in tumor size. The response type is shown at the end of the bar. The data is shown below in Figure 3.9.2.

Figure 3.9.2 – Data Set for Waterfall Chart

Confidence limits are shown at +20% and -30%. A partial response is generally indicated for tumor shrinkage of 30% or more. However, the author does not claim domain-specific expertise. See domain-centric papers for more information about such details.

The %MODSTYLE macro is used to create a new style derived from “Listing” for the two groups.

The graph in Figure 3.9.3 uses a different set of colors and presentation aspects, including bars with a textured look. The confidence region is displayed using a band plot with 50% transparency. For both graphs, the %MODSTYLE macro is used to derive a new style with the custom colors.

Figure 3.9.3 – Waterfall Chart with Alternative Appearance

%modstyle(name=waterfall_2, parent=listing, type=CLM,

numberofgroups=2,

colors=black black, fillcolors=cxbf0000 gold);

ods listing style=waterfall_2;

title 'Change in Tumor Size';

title2 'ITT Population';

proc sgplot data=TumorSizeSorted ;

band x=cid upper=20 lower=-30 / transparency=0.5 fill nooutline

legendlabel='Confidence';

vbarparm category=cid response=change / group=group datalabel=label

datalabelattrs=(size=5 weight=bold) dataskin=pressed;

xaxis display=none;

yaxis values=(60 to -100 by -20) grid;

inset ("C="="CR" "R="="PR" "S="="SD" "P="="PD" "N="="NE") /

title='BCR' position=bottomleft border

textattrs=(size=6 weight=bold);

keylegend / title='' location=inside position=topright

across=1 border;

run;

A VBARPARM statement is used instead of a VBAR statement as we want to layer a band plot in the graph. Grid lines are enabled. Group display of "Cluster" is used so that we can display the bar data labels.

Relevant details are shown in the code snippet above, and some options are trimmed to fit the space. For full details, see Program 3_9.

3.10 Injection Site Reaction

The graph in Figure 3.10.1 shows the incidence of injection site reaction by Time and Cohort.

Figure 3.10.1 – Injection Site Reaction

The (simulated) data is shown in Figure 3.10.2, with incidence by group over time.

%modstyle(name=Injection, parent=listing, type=CLM, numberofgroups=4,

colors=black black black black,

fillcolors=gray pink lightgreen lightblue);

ods listing style=styles.Injection;

proc sgplot data=Incidence;

vbar time / response=incidence group=group groupdisplay=cluster;

xaxis discreteorder=data valueattrs=(size=8) display=(nolabel);

yaxis grid display=(noticks);

keylegend / title='' location=inside position=topright across=1 border;

run;

We have used a VBAR statement with Time as the category and Group (Cohort) as the group. The time values are treated as discrete, and each cluster of incidence bars is positioned at equidistant midpoints along the axis.

The %MODSTYLE macro is used to derive a new style from "Listing" having four groups with fill colors as specified. The outline colors for all groups are set to black.

Figure 3.10.2 – Data Set for Injection Site Reaction Graph

Figure 3.10.3 – Injection Site Reaction in Grayscale

title 'Incidence of Injection-site Reaction by Time and Cohort - Erythema';

title2 'As-treated Population';

ods listing style=Journal2;

proc sgplot data=Incidence;

vbar time / response=incidence group=group groupdisplay=cluster;

xaxis discreteorder=data valueattrs=(size=8) display=(nolabel);

yaxis grid display=(noticks);

keylegend / title='' location=inside position=topright across=1 border;

run;

The graph above uses the JOURNAL2 style suitable for submissions to journals that are published in grayscale medium. The group classifications are displayed using fill patterns.

Relevant details are shown in the code snippet above. For full details, see Program 3_10.

3.11 Distribution of Maximum LFT by Treatment

The graph below shows the distribution of LFT values by Test and Treatment.

3.11.1 Distribution of Maximum LFT by Treatment with Multi-Column Data

Figure 3.11.1.1 – Distribution of Maximum LFT by Treatment with Multi-Column Data

title 'Distribution of Maximum LFT by Treatment';

footnote j=l 'Level of concern is 2.0 for ALAT, ASAT and ALKPH and 1.5 for BILTOT';

proc sgplot data=LFT;

refline 1 1.5 2 / lineattrs=(pattern=shortdash);

vbox a / category=test discreteoffset=-0.15 boxwidth=0.2 name='a'

legendlabel='Drug A (N=209)';

vbox b / category=test discreteoffset= 0.15 boxwidth=0.2 name='b'

legendlabel='Drug B (N=405)';

xaxis display=(nolabel);

run;

The graph above shows the distribution of LFT values by Test and Treatment using the VBOX statement. For the graph above, the data is in a multi-column format as shown in Figure 3.11.1.2. Lab test values for each case are shown for two treatments.

Figure 3.11.1.2 – Multi-Column Data for Graph

Specific discrete offset values are used for each treatment to create side-by-side box plots. A single REFLINE statement is used to display all the concern levels.

Data with treatment as a group is more scalable as the groups are automatically positioned by the VBOX statement as shown in the next example.

3.11.2 Distribution of Maximum LFT by Treatment Grayscale with Group Data

This graph displays the Distribution of Maximum LFT graph by Treatment group in grayscale.

Figure 3.11.2.1 – Distribution of Maximum LFT by Treatment Grayscale with Group Data

ods listing style=journal;

title 'Distribution of Maximum LFT by Treatment';

footnote j=l 'Level of concern is 2.0 for ALAT, ASAT and ALKPH and 1.5 for BILTOT';

proc sgplot data=lft_Grp;

refline 1 1.5 2 / lineattrs=(pattern=shortdash);

vbox value / category=test group=drug groupdisplay=cluster

lineattrs=(pattern=solid) medianattrs=(pattern=solid)

whiskerattrs=(pattern=solid);

xaxis display=(nolabel);

run;

Figure 3.11.2.2 – Grouped Data for Graph

In this example, the data is arranged by group, instead of multi-column as in 3.11.1. We are using the Journal style, which uses different gray shades for the fill color for each group. Line patterns for the box, median, and whiskers are set to solid.

Relevant details are shown in the code snippet above. For full details, see Program 3_11.

3.12 Clark Error Grid

The Clark Error Grid graph is used to quantify the clinical accuracy of blood glucose levels generated by the meters. The sensor response and the reference value are plotted on the grid.

3.12.1 Clark Error Grid

The graph includes demarcated zones that indicate the divergence of the meter values from reference values. Zone "A" demarcates the zone where the divergence is < 20%. Zone "B" has divergence > 20%, but not leading to improper treatment. Other zones indicate dangerous or confusing results.

Figure 3.12.1 – Clark Error Grid for Blood Glucose Measurement Accuracy

title 'Clark Error Grid for Blood Glucose';

proc sgplot data=plotZoneCount noautolegend dattrmap=attrmap;

scatter x=x y=y / group=zone attrid=A

markerattrs=(symbol=circlefilled);

series x=rfbg y=sbg / group=id nomissinggroup

lineattrs=graphdatadefault(color=black) ;

scatter x=xl y=yl / markerchar=label markercharattrs=(weight=bold);

xaxis min=0 max=400 offsetmin=0 offsetmax=0

label='Reference Blood Glucose';

yaxis min=0 max=400 offsetmin=0 offsetmax=0

label='Sensor Blood Glucose';

run;

The data for this graph includes the measured and reference glucose level observations, data for zone boundaries, and the zone labels and data for zone labels.

The scatter plot in the program is used to draw the metered glucose values by reference. The series plot is used to display the boundaries of each zone, and the scatter plot is used to display the zone name. A discrete attribute map is used to color the markers in each zone appropriately.

3.12.2 Clark Error Grid in Grayscale

Figure 3.12.2 shows the Clark Error Grid in grayscale. Different marker shapes are used for each zone. Zone labels are highlighted with a circle.

Figure 3.12.2 – Clark Error Grid in Grayscale

%modstyle(name=Clark, parent=journal, type=CLM, numberofgroups=5,

markers=triangle circle square diamond triangledown);

ods listing style=Clark;

title 'Clark Error Grid for Blood Glucose';

proc sgplot data=plotZoneCount noautolegend;

scatter x=x y=y / group=zone attrid=A markerattrs=(size=5);

series x=rfbg y=sbg / group=id nomissinggroup

lineattrs=graphdatadefault(color=black);

bubble x=xl y=yl size=size / bradiusmin=14 bradiusmax=15

fillattrs=(color=white);

scatter x=xl y=yl / markerchar=label

markercharattrs=(size=5 weight=bold);

xaxis min=0 max=400 offsetmin=0 offsetmax=0

label='Reference Blood Glucose';

yaxis min=0 max=400 offsetmin=0 offsetmax=0

label='Sensor Blood Glucose';

run;

The same graph as in Section 3.12.1 is rendered here for a grayscale medium. The key difference is to ensure the correct decoding of the data in the five zones. Here, I have used the %MODSTYLE macro to define five groups, each with a distinct marker shape.

The labels for each zone do not stand out against the markers of the same color--especially in the dense areas. So, I used a bubble plot to draw a filled white bubble behind the zone label. All axis offsets are set to zero to ensure that the zone boundaries touch the axes. This also removes the effect of any offset contributions by the scatter plot with MARKERCHAR.

Relevant details are shown in the code snippet above. For full details, see Program 3_12.

3.13 The Swimmer Plot

This "swimmer plot" displays the response of the tumor to the study drug over time in months. Each horizontal bar represents one subject in the study.

3.13.1 The Swimmer Plot for Tumor Response over Time

This graph shows the tumor response by subject over time.¹ Each horizontal bar in the graph represents one subject. The inset line indicates complete or partial response with start and end times.

Figure 3.13.1.1 – Tumor Response Graph

title 'Tumor Response for Subjects in Study by Month';

proc sgplot data= swimmer dattrmap=attrmap nocycleattrs;

highlow y=item low=low high=high / highcap=highcap type=bar group=stage

fill nooutline name='stage' nomissinggroup transparency=0.3;

highlow y=item low=startline high=endline / group=status

name='status' nomissinggroup attrid=statusC;

scatter y=item x=start / name='s' legendlabel='Response start'

markerattrs=(symbol=trianglefilled size=8 color=darkgray);

scatter y=item x=end / name='e' legendlabel='Response end'

markerattrs=(symbol=circlefilled size=8 color=darkgray);

scatter y=item x=xmin / name='x' legendlabel='Continued response '

markerattrs=(symbol=trianglerightfilled size=12

color=darkgray);

scatter y=item x=durable / name='d' legendlabel='Durable responder'

markerattrs=(symbol=squarefilled size=6 color=black);

scatter y=item x=start / group=status attrid=status

markerattrs=(symbol=trianglefilled size=8);

scatter y=item x=end / group=status attrid=status

markerattrs=(symbol=circlefilled size=8);

xaxis display=(nolabel) label='Months'

values=(0 to 20 by 1) valueshint;

yaxis reverse display=(noticks novalues noline)

label='Subjects Received ...';

keylegend 'stage' / title='Disease Stage';

keylegend 'status' 's' 'e' 'd' / location=inside

position=bottomright across=1;

run;

An arrowhead on the right indicates continuing response. The bar contains durations over which the "Complete" or "Partial" response is indicated, with a start and end time. The disease stage is indicated by the color of the bar, with a legend showing the unique values below the x-axis. An inset is included to decode the different markers in the event bar. A "Durable" response is indicated by the square marker on the left end of the bar.

Note that the start and end points for each response are represented by colored markers inside each event bar. However, the same are shown in grayscale in the inset table. This is achieved by first plotting the markers in a gray color, and overdrawing those by colored markers using GROUP=Status. The scatter plots that plot the gray markers are the ones that are included in the inset.

Also note the existence of a "right arrow" marker in the inset indicating the continuing event. This is done by including a scatter plot with a right triangle marker in the plot, but the data for this marker is missing. However, it is included in the inset.

The structure of the data set needed for the graph is shown below.

Figure 3.13.1.2 – Data for Tumor Response Graph

Note, although the program for this graph is longer than some other ones, it can be built one part at a time.

• First, plot the full duration from Low to High by Item using a grouped High Low plot with a High Cap and TYPE=BAR. Include this in the outside legend.

• Layer the individual "Response" events from Startline to Endline by Status using a High Low bar with the default line type. Include this in the inset legend.

• Layer the Start and End events in gray color. Include these in the inset legend.

• Layer the Start and End events again using GROUP=Status.

• Add a scatter plot with missing data to include the "Right Arrow" in the legend.

The Discrete Attribute Map data set contains two maps, one for the colored graph called "StatusC", and one for the grayscale graph called "StatusJ". AttrId=StatusC is used in this graph. For full details, see Program 3_12.

3.13.2 The Swimmer Plot for Tumor Response in Grayscale

The tumor response graph appears in grayscale. The disease stage is shown on the left as we cannot use a color indicator.

Figure 3.13.1 – Tumor Response Graph in Grayscale

proc sgplot data= swimmer2 dattrmap=attrmap nocycleattrs;

highlow y=item low=low high=high / highcap=highcap type=bar

group=stage fill nooutline lineattrs=(color=black)

fillattrs=(color=lightgray)

name='stage' barwidth=1 nomissinggroup;

highlow y=item low=startline high=endline / group=status

lineattrs=(thickness=2)

name='status' nomissinggroup attrid=statusJ;

scatter y=item x=start / markerattrs=(symbol=trianglefilled size=8)

name='s' legendlabel='Response start';

scatter y=item x=end / markerattrs=(symbol=circlefilled size=8)

name='e' legendlabel='Response end';

scatter y=item x=xmin / name='x' legendlabel='Continued response '

markerattrs=(symbol=trianglerightfilled size=12

color=darkgray);

scatter y=item x=durable / name='d' legendlabel='Durable responder'

markerattrs=(symbol=squarefilled size=6 color=black);

scatter y=item x=start / attrid=statusJ

markerattrs=(symbol=trianglefilled size=8) group=status;

scatter y=item x=end / attrid=statusJ

markerattrs=(symbol=circlefilled size=8) group=status;

highlow y=item low=stagex high=stagex / lowlabel=stage

lineattrs=(thickness=0);

xaxis display=(nolabel) label='Months'

values=(0 to 20 by 1) valueshint;

yaxis reverse display=(noticks novalues noline)

label='Subjects Received Study Drug';

keylegend 'status' 's' 'e' 'd' 'x' / noborder location=inside

position=bottomright across=1;

run;

Patterned lines are used to draw the response events and a highlow plot to draw the stage labels on the left. AttrId=StatusJ is used in this graph. For full details, see Program 3_13.

3.14 CDC Chart for Length and Weight Percentiles

The CDC chart for length and weight for boys and girls from birth to 36 months is widely used in pediatric practices to track vital statistics. This graph is shown in Figure 3.14.1, and the entire chart is created using the SGPLOT procedure. The purpose is primarily to evaluate the features of the procedure.

Figure 3.14.1 – CDC Chart for Length and Weight Percentiles

The graph above renders the full CDC chart for Length and Weight Percentiles from the data for one subject. The original graph was a bit taller, but I shrank it to fit this page. The data required is created by appending the CDC percentile data with the historical data for one subject. The CDC data is included in the file named "3_14_CDC_Cleaned.csv".

The CDC data for the percentile curves is shown below. Only a few of the observations are displayed to conserve space. Also, the data contains all the columns for 5, 10, 25, 50, 75, 90, and 95 percentiles, but only a few columns are included to fit in the space.

Figure 3.14.2 – Data for CDC Graph

The historical data for the subject is appended at the bottom of the curve data, using the column names Sex, Age, Height, and Length, as shown below.

Figure 3.14.3 – Data for CDC Graph

title j=l h=9pt 'Birth to 36 months: Boys' j=r "Name: John Smith";

title2 j=l h=8pt "Length-for-age and Weight-for-age percentiles" j=r "Record # 12345-67890";

footnote j=l h=7pt "Published May 30, 2000 (modified 4/20/01) CDC";

proc sgplot data=Chart_Patient noautolegend;

where sex=1;

refline 3 4 5 6 / axis=y2 lineattrs=graphgridlines;

/*--Curve bands--*/

band x=agemos lower=w5 upper=w95 / y2axis fillattrs=graphdata1

transparency=0.9;

band x=agemos lower=w10 upper=w90 / y2axis fillattrs=graphdata1

transparency=0.8;

band x=agemos lower=w25 upper=w75 / y2axis fillattrs=graphdata1

transparency=0.8;

/*--Curves--*/

series x=agemos y=w5 / y2axis lineattrs=graphdata1 transparency=0.5;

series x=agemos y=w10 / y2axis lineattrs=graphdata1 transparency=0.7;

series x=agemos y=w25 / y2axis lineattrs=graphdata1 transparency=0.7;

series x=agemos y=w50 / y2axis x2axis lineattrs=graphdata1;

series x=agemos y=w75 / y2axis lineattrs=graphdata1 transparency=0.7;

series x=agemos y=w90 / y2axis lineattrs=graphdata1 transparency=0.7;

series x=agemos y=w95 / y2axis lineattrs=graphdata1 transparency=0.5;

The program that is required to draw all the elements of this graph is long, but easy to understand. So, I have shown it is in parts across the following pages. The first part of the program is shown above, with titles, footnotes, and percentile curves for Weight. The bands are drawn with three transparent overlays to create the appearance of color gradation. The curves are overlaid on the bands.

/*--Curve labels--*/

scatter x=agemos y=w5 / markerchar= l5 y2axis textattrs=graphdata1;

scatter x=agemos y=w10 / markerchar=l10 y2axis textattrs=graphdata1;

scatter x=agemos y=w25 / markerchar=l25 y2axis textattrs=graphdata1;

scatter x=agemos y=w50 / markerchar=l50 y2axis textattrs=graphdata1;

scatter x=agemos y=w75 / markerchar=l75 y2axis textattrs=graphdata1;

scatter x=agemos y=w90 / markerchar=l90 y2axis textattrs=graphdata1;

scatter x=agemos y=w95 / markerchar=l95 y2axis textattrs=graphdata1;

/*--Patient datas--*/

series x=age y=weight / y2axis lineattrs=graphdata1(thickness=2)

markers markerattrs=(symbol=circlefilled size=11);

series x=age y=weight / y2axis lineattrs=graphdata1(thickness=2)

markers markerattrs=(symbol=circlefilled size=7, color=white);

The code section above draws the curve labels for the percentile curves on the right. This is overlaid by the historical subject weight data as a series plot. The code for Height is shown below.

/*--Curve bands--*/

band x=agemos lower=h5 upper=h95 / fillattrs=graphdata3

transparency=0.9;

band x=agemos lower=h10 upper=h90 / fillattrs=graphdata3

transparency=0.8;

band x=agemos lower=h25 upper=h75 / fillattrs=graphdata3

transparency=0.8;

/*--Curves--*/

series x=agemos y=h5 / lineattrs=graphdata3(pattern=solid)

transparency=0.5;

series x=agemos y=h10 /lineattrs=graphdata3(pattern=solid)

transparency=0.7;

series x=agemos y=h25 /lineattrs=graphdata3(pattern=solid)

transparency=0.7;

series x=agemos y=h50 /lineattrs=graphdata3(pattern=solid) x2axis;

series x=agemos y=h75 /lineattrs=graphdata3(pattern=solid)

transparency=0.7;

series x=agemos y=h90 /lineattrs=graphdata3(pattern=solid)transparency=0.7;

series x=agemos y=h95 /lineattrs=graphdata3(pattern=solid) transparency=0.5;

/*--Curve labels--*/

scatter x=agemos y=h5 / markerchar = l5 textattrs=graphdata3;

scatter x=agemos y=h10 / markerchar =l10 textattrs=graphdata3;

scatter x=agemos y=h25 / markerchar =l25 textattrs=graphdata3;

scatter x=agemos y=h50 / markerchar =l50 textattrs=graphdata3;

scatter x=agemos y=h75 / markerchar =l75 textattrs=graphdata3;

scatter x=agemos y=h90 / markerchar =l90 textattrs=graphdata3;

scatter x=agemos y=h95 / markerchar =l95 textattrs=graphdata3;

/*--Patient datas--*/

series x=age y=height / lineattrs=graphdata3(pattern=solid thickness=2)

markers markerattrs=(symbol=circlefilled size=11);

series x=age y=height / lineattrs=graphdata3(pattern=solid thickness=2)

markers markerattrs=(symbol=circlefilled size=7, color=white);

The Height (Length) and Weight data ranges are different, and these need to be plotted with different vertical scales and axis details. We can do that by using two separate Y axes for each column. Here we used the Y2AXIS for Weight and YAXIS for Height. This breaks the link between the two variables scales thus allowing us to draw the Height and Weight curves and data independently.

/*--Table--*/

inset " Date Age(Mos) Wt(Kg) Ln(Cm)"

"04 May 2010 Birth 3.5 52"

"02 Aug 2010 3 6.5 63"

"01 Nov 2010 6 8.5 68"

"07 Feb 2011 9 9.5 72"

"02 May 2011 12 10.5 75" / border

textattrs=(family='Courier' size=6 weight=bold)

position=bottomright;

xaxis grid offsetmin=0 integer values=(0 to 36 by 3);

x2axis grid offsetmin=0 integer values=(0 to 36 by 3);

yaxis grid offsetmin=0.25 offsetmax=0.0 label="Length (Cm)"

integer values=(30 to 110 by 5)

labelattrs=graphdata3(weight=bold) valueattrs=graphdata3;

y2axis offsetmin=0.0 offsetmax=0.25 label="Weight (Kg)" integer

values=(2 to 18 by 1) labelattrs=graphdata1(weight=bold)

valueattrs=graphdata1;

run;

Note the options on the YAXIS and the Y2AXIS statements. The Y2AXIS has OFFSETMAX=0.25, which means that all items that are associated with it are displayed only in the lower 75% of the graph height. This causes all the "Weight" related items and the axis (drawn in blue) to be drawn in the lower part.

Similarly, the YAXIS has OFFSETMIN =0.25, so all the "Height or Length" related items are drawn in the upper part of the graph. More importantly, the scaling for each axis is independent, allowing us to draw different tick values on the axes. To make the graph easier to read, we have taken care to position the Y grid lines so that they line up with the values on each side.

The program snippet above also shows how we can include the historical data as a tabular display in the chart for easy reference. I have used the INSET statement to create a tabular display. Although the values here are hardcoded, we can use macro variables assigned from the DATA step.

Relevant details are shown in the code snippet above. For full details, see Program 3_14.

3.15 Summary

The graphs discussed in this chapter represent a large fraction of the graphs commonly used in the clinical trials industry and in Health and Life Sciences in general. Most of these are "single-cell" graphs where the main data is displayed in one cell in the middle, along with other information.

As we have seen in the examples, the SAS 9.3 SGPLOT procedure provides you with a large selection of plot statements and features that can be used to create many graphs. Many of these plot statements can be combined in creative ways to create almost any graph that might be needed.

The SG Annotate facility further enhances your ability to create custom graphs using the SGPLOT procedure. As seen in many examples, statistics tables or tables of subjects in the study can be displayed correctly positioned along the x- or y-axis, either outside the data area, or inside the data area.

Group attributes such as colors or marker symbol shapes can be assigned by specific group values using the Attribute Map feature. This ensures that attributes are correctly mapped regardless of the data order, or whether some groups are present or not.

¹ My graph is based on ideas presented in a paper. See Phillips, Stacey D. 2014. “Swimmer Plot: Tell a Graphical Story of Your Time to Response Data Using PROC SGPLOT.” Proceedings of the Pharmaceutical Industry SAS Users Group (PharmaSug) 2014 Conference. San Diego, CA: SAS Institute Inc. Available at http://www.pharmasug.org/2014-proceedings.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Clinical Graphs Using the SAS 9.3 SGPLOT Procedure

Create new playlist

Sign In

Sign Up