JMP Analysis

Univariate Descriptive Analysis

At the beginning of an exploratory analysis, visualizing each variable individually will familiarize the analyst with the variation in each variable, assist in identifying outliers that should be investigated, and suggest variables that are related to total costs, the primary focus of our analysis. Figure 11.7 Histograms for Total Costs, Length of Stay, Birthweight shows the histograms for total costs, length of stay, and birthweight (in grams).
Figure 11.7 Histograms for Total Costs, Length of Stay, Birthweight
The distributions of total costs and length of stay are both right-skewed. Experience suggests that longer lengths of stay and higher costs are due to complications. Severe complications requiring lengthy hospital stays are relatively infrequent for these Adirondack hospitals. The birthweight is roughly bell-shaped with several very high values. Figure 11.8 Descriptive Statistics for Total Costs, Length of Stay, Birthweight gives a table of descriptive statistics for these three variables.
Figure 11.8 Descriptive Statistics for Total Costs, Length of Stay, Birthweight
The maximum birthweight of 9600 grams (21.2 lbs.) does not seem to be a reasonable weight for a newborn. Returning to the JMP data table and examining the corresponding record shows that this newborn was a female born at Saratoga Hospital with a one day length of stay, an extreme severity of illness and a moderate mortality risk. The total cost for the one day stay was $995.67. When dispositioning outliers, consulting external references can be of assistance. The CDC Clinical Growth Charts give selected percentiles for the distribution of children’s weights in the US population. The chart shows that the 97th percentile (the largest percentile available) for newborn females is approximately 9.5 lbs. Based on this information, it is possible that there is an error in this particular record since the weight is more than twice the 97th percentile from the CDC Clinical Growth Charts. Further consultation with the data provider can assist the analyst with determining how to reconcile this outlier.
Figure 11.9 Frequency Distributions for Emergency Department Indicator and CCS Procedure Description shows the distributions of Emergency Department Indicator and CCS Procedure Description. Notice that only one newborn was admitted through an emergency department, so this variable will not be useful in understanding total costs as it has virtually no variability.
Figure 11.9 Frequency Distributions for Emergency Department Indicator and CCS Procedure Description
Examining the CCS Procedure Description we see a large of number of procedures with a very few number of occurrences (e.g., nasogastric tube) and relatively high frequencies for circumcision, no procedure, and prophylactic vaccination. The low frequency procedures will be aggregated into a single level called “Other Procedures” using Cols > Recode. Highlight all of the low frequency procedures, and click the Group button, rename the group to “Other Procedures,” and save as a new column. The new column “CCS Procedures Aggregated” has levels for circumcision, no procedure, prophylactic vaccination, and other procedures. Figure 11.10 Frequency Distributions for CCS Procedure Aggregated shows the distribution of this new column.
Figure 11.10 Frequency Distributions for CCS Procedure Aggregated
This new column will facilitate analysis of a potential relationship between total charges and no procedure, common procedures (circumcision and vaccination), and less commonly performed procedures.
Developing familiarity with the data one variable at a time helps guide the analyst when choosing variables for exploring bivariate and multivariate relationships.

Exploring Relationships with Total Costs: Length of Stay

In data sets with even a modest number of variables, there are many possible combinations that could be examined. A good strategy is to begin by examining bivariate relationships based on the problem statement, domain knowledge, and the univariate analysis. For example, the Emergency Department Indicator can be eliminated from consideration as described above. Graph Builder is a very flexible platform for visualizing relationships between variables. It offers a variety of different graphs and features for evaluating multivariate relationships.
Experience suggests that total costs are directly related to length of stay, so we will begin by visualizing this relationship. In Graph Builder, drag Length of Stay to the x drop zone and Total Costs to the y drop zone. The resulting scatterplot is shown in Figure 11.11 Scatterplot of Length of Stay and Total Costs.
Figure 11.11 Scatterplot of Length of Stay and Total Costs
The relationship appears to follow a linear trend but there are a number of observations that are not consistent with this pattern. To identify particular observations, hover over a marker on the scatterplot. Selecting the marker will highlight the corresponding row in the data table. The observation for the 24 day length of stay with an approximate cost of $8000 corresponds to a male born at the Mary Imogene Bassett Hospital weighing 3900 grams (8 pounds 9.5 ounces) who had no procedure and an APR DRG Description of NEONATE BIRTHWT >2499G W OTHER SIGNIFICANT CONDITION, a minor APR severity of illness, and a minor APR risk of mortality. The All Patient Refined Diagnosis Related Groups is a classification system that links conditions to the resources consumed. This classification system has two subclassifications, severity of illness and risk of mortality, that are determined based on patient characteristics such as age and comorbidities. These classifications are used when determining payment.
To see how this newborn’s severity of illness and mortality risk compare to other newborns with the same APR DRG description, a cross-tabulation (crosstab) can be created with the JMP Tabulate feature. Drag APR DRG Description into the drop zone for rows and APR Severity of Illness Description and APR Risk of Mortality to the drop zone for columns. Figure 11.12 Crosstab for Three Variables Classifying Diagnosis, Illness Severity, and Mortality Risk shows the crosstab.
Figure 11.12 Crosstab for Three Variables Classifying Diagnosis, Illness Severity, and Mortality Risk
In this data set, 67 of the 70 (95.7%) newborns classified as NEONATE BIRTHWT >2499G W OTHER SIGNIFICANT CONDITION were assigned Minor for severity of illness and all 70 were assigned a minor risk of mortality. So the outlying observation is similar to others in this APR DRG Description category. This type of analysis can help the analyst assess the reasonableness of outliers absent other information. Further investigation is needed to fully understand the reason for the relatively low total cost for a 24 day length of stay beyond what is available in this data set.

Exploring Relationships with Total Costs: Hospital

In this section we will explore the total costs data by hospital. To look at the distribution of total costs by each of the eight hospitals select Graph Builder and drag Total Costs into the x drop zone and Facility Name into the y drop zone. Right click over any of the makers and select Add > Line. This line will connect a summary statistic (such as the mean or median) for each hospital distribution. Dropdown menus on the left allow customization of the line. “Jittering” the points will spread them out to better visualize the data density. This is particularly useful for large data sets. Since the cost distributions are right skewed, we will choose the median as the summary statistic. The mean is sensitive to the large outliers, while the median is not. Figure 11.13 Dot Plot of Total Costs by Hospital shows the completed Graph Builder dialog.
Figure 11.13 Dot Plot of Total Costs by Hospital
In this display, the costs for the hospitals are displayed on the same x-axis scale and we see that the costs span several orders of magnitude. Some of the distributions are highly right skewed which determines the x-axis range and makes visual comparison between the hospitals difficult. Viewing the cost data on a log scale will make visual comparison easier as it will spread out the left side of the cost distributions. To change the x-axis to a log scale right click on the x-axis and select Axis Setting. In the X Axis Settings dialog change the Scale Type to Log and set the Minimum to 100. The graph with a log scale is displayed in Figure 11.14 Dot Plot of Total Costs by Hospital on a Log Scale.
Figure 11.14 Dot Plot of Total Costs by Hospital on a Log Scale
When plotted a log scale we can more clearly see the differences in total cost between the hospitals. The Champlain Valley Physicians, Mary Imogene Bassett, and Glens Falls hospitals display the most variation in total costs. The Adirondack Medical Center – Saranac Lake has the highest median cost while the Alice Hyde Hospital has the lowest.
The graph in Figure 11.14 Dot Plot of Total Costs by Hospital on a Log Scale can be enhanced by dragging Length of Stay to the x axis drop zone to create side-by-side plots of Total Costs and Length of Stay by hospital. Plotted on a linear scale, Length of Stay distributions are right skewed for some hospitals, so the x-axis was changed to a log scale. The plot is shown in Figure 11.15 Side-by-side Plots of Total Costs and Length of Stay by Hospital.
Figure 11.15 Side-by-side Plots of Total Costs and Length of Stay by Hospital
The length of stay is much less variable across the hospitals compared to total costs; all eight hospitals had a median length of stay of two days. This suggests that the differences in total costs between hospitals are related to something other than length of stay. Possible explanations for these differences are hospital size and services provided.
Last updated: October 12, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset