Simple
regression revealed that taken alone length of stay is a significant
predictor of total costs. However, the linear model underestimates
the total costs associated with longer lengths of stay. It makes
sense that the longer a newborn stays in the hospital, the higher
the total costs, but this does not tell the whole story. Additional
costs are likely incurred that are related to the various diagnoses
and there will be different costs associated with the type and quantity
of treatments.
Birthweight was shown
not to be a significant predictor of total costs. This contradicts
our expectations as low birthweight is usually associated with premature
birth and the resulting complications require additional therapies
and hence increased costs. However, there is no information given
in the data set that indicates if the births were premature. The
de-identified data is inherently restricted in the details provided
to maintain patient anonymity. This may limit the ability to create
an adequate predictive model. The full SPARCS data contains additional
information that may result in a better predictive model. The limitations
of the data set used should always be considered when determining
if the statistical model adequately addresses the problem posed. Reviewing
the services offered at the Champlain Valley Physicians Hospital (New
York State Department of Health website) shows that the hospital is
designated as a Level 1 Perinatal Center which only provides care
for normal and low-risk deliveries and does not have a neonatal intensive
care unit. Since premature infants are often low birth weight, it
seems reasonable that such infants born at CVPH would be transferred
to another hospital having more neonatal care services. Conducting
such additional research can often help explain the reasonableness
of statistical results.
A good strategy for
attacking a statistical problem is to begin simply and proceed to
more complicated models. Descriptive, univariate analysis is a crucial
first step to become familiar with the data. This is followed by
bivariate and then multivariate analysis. At each stage, a better
understanding of the data and the relationships between variables
is obtained which guides subsequent, more complicated analyses. As
a next step, a multiple regression analysis would create a predictive
equation for total costs with multiple independent variables and may
have improved explanatory power.