A good practice prior to developing
a bivariate predictive model is to visualize the relationship on a
scatterplot. From the Analyze menu select the Fit Y by X platform
and enter Total Costs in the Y, Response field and Length of Stay
in the X, Factor field. The resulting scatterplot is shown in
Figure 12.5 Scatterplot of Total Costs and Length of Stay.
As expected, as length of stay increases,
total costs increase. The correlation between Total Costs and Length
of Stay is 0.66 as found in the correlation matrix obtained by selecting
Analysis > Multivariate Methods > Multivariate. A simple linear
regression analysis will model this relationship as a straight line
and allow us to quantify the relationship. It is good practice to
start with a simple model, assess the adequacy of the model, and if
necessary proceed to developing more complex models.
The estimated regression
equation is:
Total Costs = -1355.061
+ 1441.227*Length of Stay
This best fit line is
found using the ordinary least squares method where the estimated
slope and intercept are chosen to minimize the sum of the squared
distances from the observations to the fitted line. The intercept
of -1355.06 is the estimated average total cost when length of stay
is zero. Since it does not make sense to have an inpatient hospital
stay of 0 days, the intercept is not interpreted in the problem context,
but serves as a fitting constant. The slope indicates that for each
increase of one day in length of stay there is an estimated average
increase of $1441.23 in total costs. The slope is an estimate of
the daily cost of hospitalization for newborns. Always assess the
slope coefficient for reasonableness in the problem context. Do both
the sign and magnitude make sense? The slope is positive which says
that as length of stay goes up total costs go up. A quick internet
search for average daily hospital costs will assist in determining
the reasonableness of the magnitude of the slope. It appears that
$1441.23 is a plausible daily charge.
To establish if length
of stay is a significant predictor of total costs, a test of hypothesis
should be conducted for the slope coefficient. The Parameter Estimates
table in
Figure 12.6 Linear Fit for Total Costs and Length of Stay shows the t-ratio
and p-value (Prob > |t|) associated with a test of hypothesis that
the slope is equal to zero versus the alternative that the slope is
different from zero. The p-value is <0.0001 which indicates that
length of stay is a significant predictor of total costs for infants
born in Champlain Valley Physicians Hospital in 2014.