The
case “Appointment Wait Times at Veterans Medical Centers”
presented a variety of univariate, bivariate, and multivariate visualizations
to become acquainted with the appointment backlogs and the characteristics
of each medical center (e.g., bed capacity, location). While graphs
allow us to easily discern patterns, numerical statistics offer a
more precise measure of various characteristics of the distribution
of each variable.
A number of the variables,
such as Provider ID, Hospital Name, and the address columns, are unique
identifiers associated with each veterans medical center, so a statistical
summary of these columns is not warranted. The key variables are
those relating to the appointment backlog.
Tables of numerical
statistics can be easily generated from Tabulate. Drag the columns
containing backlog data to the drop zone for rows. The Sum will appear
as a default. Drag “Sum” to the empty column heading
above the numbers. Now select N, Mean, Std Dev, Min, and Max and
drag them to the column header occupied by Sum. The result is shown
in
Figure 10.1 Tabulate for Backlog Variables.
The statistics should
be rounded. Given the magnitude of the backlog values, we will round
them all to the nearest integer. At the bottom of the Tabulate dialog
click on the Change Format button and complete the dialog as shown
in
Figure 10.2 Changing Tabulate Formats.
Depending on the problem,
other descriptive statistics may be informative. The minimum observed
backlog difference is negative which means that at least one VMC reduced
their 31-60 day backlog. Similarly, the positive maximum value indicates
that at least one VMC’s backlog increased from 2015 to 2016.
Calculating the percentage of VMCs in the sample that are unchanged,
decreased and increased their backlogs will add insight. To find
any VMCs whose backlog has not changed, Choose Rows > Row Selection >
Select Where and complete the dialog as shown in
Figure 10.4 Selecting Rows with Backlog Difference of Zero to see if
there are any VMCs that had no change in backlog.
At the left of the data
table we see that there are 12 rows selected out of 16. Twelve out
of 16, or 75% of the VMCs saw increases in their backlogs over a year,
while 25% reduced their backlogs. These percentages should be included
in the statistical summary of the data. Using Rows > Row Selection
> Select Where is often easier when there are precise numerical
criteria. The slider bar available in the Rows > Data Filter may
be more difficult for selecting the precise numerical value desired.
Another way to describe the backlog data is to look at
the relative change from 2015 to 2016. This can be done by creating
a new column, called Percent Backlog Change using the Formula Editor.
Figure 10.6 Formula Editor to Create Percent Backlog Change Column shows the
completed Formula Editor where the percent change is rounded to one
decimal place.
The percent backlog
change shows that some VMCs more than doubled their backlog, while
in the best case, one VMC decreased the backlog by close to 80%.
In this section we have
showed a variety of ways to summarize the backlog data numerically.
When presenting a statistical summary, select the view of the data
that is consistent with the problem statement and will be most easily
understood by your audience.