At the beginning of an exploratory analysis, visualizing
each variable individually will familiarize the analyst with the variation
in each variable, assist in identifying outliers that should be investigated,
and suggest variables that are related to total costs, the primary
focus of our analysis.
Figure 11.7 Histograms for Total Costs, Length of Stay, Birthweight shows
the histograms for total costs, length of stay, and birthweight (in
grams).
The distributions of
total costs and length of stay are both right-skewed. Experience
suggests that longer lengths of stay and higher costs are due to complications.
Severe complications requiring lengthy hospital stays are relatively
infrequent for these Adirondack hospitals. The birthweight is roughly
bell-shaped with several very high values.
Figure 11.8 Descriptive Statistics for Total Costs, Length of Stay, Birthweight gives a table
of descriptive statistics for these three variables.
The maximum birthweight
of 9600 grams (21.2 lbs.) does not seem to be a reasonable weight
for a newborn. Returning to the JMP data table and examining the
corresponding record shows that this newborn was a female born at
Saratoga Hospital with a one day length of stay, an extreme severity
of illness and a moderate mortality risk. The total cost for the
one day stay was $995.67. When dispositioning outliers, consulting
external references can be of assistance. The CDC Clinical Growth
Charts give selected percentiles for the distribution of children’s
weights in the US population. The chart shows that the 97th percentile
(the largest percentile available) for newborn females is approximately
9.5 lbs. Based on this information, it is possible that there is
an error in this particular record since the weight is more than twice
the 97th percentile from the CDC Clinical Growth Charts. Further consultation
with the data provider can assist the analyst with determining how
to reconcile this outlier.
Examining the CCS Procedure
Description we see a large of number of procedures with a very few
number of occurrences (e.g., nasogastric tube) and relatively high
frequencies for circumcision, no procedure, and prophylactic vaccination.
The low frequency procedures will be aggregated into a single level
called “Other Procedures” using Cols > Recode. Highlight
all of the low frequency procedures, and click the Group button, rename
the group to “Other Procedures,” and save as a new column.
The new column “CCS Procedures Aggregated” has levels
for circumcision, no procedure, prophylactic vaccination, and other
procedures.
Figure 11.10 Frequency Distributions for CCS Procedure Aggregated shows
the distribution of this new column.
This new column will
facilitate analysis of a potential relationship between total charges
and no procedure, common procedures (circumcision and vaccination),
and less commonly performed procedures.
Developing familiarity
with the data one variable at a time helps guide the analyst when
choosing variables for exploring bivariate and multivariate relationships.