In the case “Creatinine Levels
in Hospitalized Patients” we visualized each variable in the
data set individually using the JMP Distribution platform as shown
in Figure 6.5. The average length of stay for these patients is 14.1
days with a minimum of 0 and a maximum of 29. Of the 372 patients,
25% have acute kidney injury. The problem statement asks us to consider
inpatient length of stay. Patients having a length of stay of 0 would
have been treated at the hospital on an outpatient basis; their records
should not be included in this analysis. Select those rows with LOS
= 0 using the JMP data filter. Select Rows > Data Filter, highlight
LOS and click Add. Drag the right hand slider to 0. The completed
Data Filter dialog is shown in
Figure 7.2 Completed Data Filter Dialog to Select Rows Where LOS=0.
The rows where LOS=0
will be highlighted in the data table. Right-click over one of the
highlighted rows and select the Exclude and Hide option. This will
exclude these observations from subsequent analyses and hide them
in subsequent graphs.
Since
this is a comparative analysis, it is beneficial to describe the data
separately for each group, those with AKI and those without AKI.
JMP Graph Builder can easily create two length of stay histograms,
one for those patients with AKI and one for those patients without
AKI. Drag LOS into the X drop zone and Outcome into the Y Group drop
zone. Select the histogram icon from the Control Panel. The resulting
data visualization is shown in
Figure 7.3 Histograms of Length of Stay by Outcome.
Notice that the resulting
chart includes the count of excluded rows.
Graph Builder uses the
same axis scales for both histograms allowing an accurate visual comparison.
This is an example of the use of small multiples, which are series
of graphs plotted on the same scale. This is a best practice in data
visualization.
The box indicates the
middle 50% of the data (the first quartile to the third quartile).
The first quartile corresponds the 25th percentile, where 25% of the
length of stays are below that value. For the AKI group, the 25th
percentile is 23 days. The vertical line inside the box is the median.
The “whiskers” are the first quartile minus 1.5 times
the interquartile range (third quartile – first quartile) and
the third quartile plus 1.5 times the interquartile range. Outliers
are indicated as dots that lie beyond the end of the whiskers. Box
plots are a compact way to visualize a data distribution including
its center, spread, skewness, and outliers.
In the No AKI box plot
we observe an outlier with a length of stay of 29 days. Click on the
dot to highlight the corresponding patient record in the JMP data
table. This is an example of JMP’s dynamic data linking feature,
where an observation or group of observations highlighted in either
a data table or graph will be highlighted in all other data tables
and graphs. The highlighted record is for Patient_ID = 7581, a 92
year old African-American man with no co-morbidities. Further investigation
should be conducted with the help of a subject matter expert to determine
if this outlier should be removed from the data set. Outliers are
removed, not based on their influence on the statistical results,
but on an understanding of the data in the domain context, and should
be dispositioned accordingly. For example, if investigation revealed
a data collection or recording error, then either a corrected value
should be entered or the observation removed. Exclusion of observations
should be documented in accordance with the practices of reproducible
research.
Finally,
Figure 7.5 Descriptive Statistics for Length of Stay by Outcome shows a table
of descriptive statistics for length of stay by outcome. This can
be accomplished using Tabulate where LOS and the desired statistics
are placed in the drop zone for Rows and the nominal variable Outcome
is placed into the drop zone for Columns.