Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix A

Basic Statistical Concepts

A.1 Introduction

The statistical concepts that are important for understanding what is going on in this book are gathered here, but briefly treated. The interested reader who wants a deeper understanding of statistical concepts should have no problems in finding suitable text books. There are, for instance, some recent texts that teaches statistics and how to use it in R (Dalgaard 2008).

A.2 Statistical Inference

Statistical inference is the science that help us draw conclusions about real-world phenomena by observing and analyzing samples from them. The theory rests on probability theory and the concept of random sampling. The statistical analysis never gives absolute truths, but only statements coupled to certain measures of their validity. These measures are almost always probability statements.

The crucial concept is that of a model, despite the fact that the present trend in statistical inference is toward nonparametric statistics. It is often stated that with today’s huge data registers, statistical models are unnecessary, but nothing could be more wrong.

The important idea in a statistical model is the concept of a parameter. It is often confused with its estimator from data. For instance, when we talk about mortality in a population, it is a hypothetical concept that is different from the ratio between the observed number of deaths and the population size (or any other measure based on data). The latter is an estimate (at best) of the former. The whole idea about statistical inference is to extract information about a population parameter from observing data.

A.2.1 Point Estimation

The case in point estimation is to find the best guess (in some sense) of a population parameter from data. That is, we try to find the best single value that is closest to the true, but unknown, value of the population parameter.

Of course, a point estimator is useless if it is not connected to some measure of its uncertainty. That takes us to the concept of interval estimation.

A.2.2 Interval Estimation

The philosophy behind interval estimation is that a guess on a single value of the unknown population parameter is useless without an accompanying measure of the uncertainty of that guess. A confidence interval is an interval, in which we say that the true value of the population parameter lies with a certain probability (often 95%).

A.2.3 Hypothesis Testing

We are often interested in a specific value of a parameter, and in regression problems this value is almost always zero (0). The reason is that regression parameters measure effects, and to test for an effect is then equivalent to testing that the corresponding parameter has value zero.

There is a link between interval estimation and hypothesis testing: To test the hypothesis that a parameter value is zero can be done through constructing a confidence interval for the parameter. The test rule is then: If the interval does not cover zero, reject the hypothesis, otherwise do not.

A.2.3.1 The Log-Rank Test

The general hypothesis testing theory behind the log-rank test builds on the hyper-geometric distribution. The calculations under the null hypothesis of no difference in survival chances between the two groups are performed conditional on both margins. In Table A.1, if the margins are fixed, there is only one degree of freedom left; for a given value of (say) d1, the three values d2, (n1 − d1), and (n2 − d2) are determined. Utilizing the fact that, under the null, d1 is hyper-geometrically distributed, results in the following algorithm for calculating a test statistic as follows:

Observe O = d1
Calculate the expected value E of O (under the null):

$E = d \frac{n_{1}}{n} .$
Calculate the variance V of O (under the null):

$V = \frac{(n - d) d n_{1} n_{2}}{n^{2} (n - 1)} .$
Repeat 1 – 3 for all tables and aggregate according to Equation (A.1).

Table A.1

The general table at one event time

Group	Deaths	Survivors	Total
I	d1	n1 -d1	n1
II	d2	n2 -d2	n2
Total	d	n−d	n

The log rank test statistic T is

$\begin{array}{l} T = \frac{\sum_{i = 1}^{k} (O_{i} - E_{i})}{\sqrt{\sum_{i = 1}^{k} V_{i}}} & (A .1) \end{array}$

Note carefully that this procedure is not equivalent to aggregating all tables of raw data!

Properties of the log rank test;

The test statistic T2 is approximately distributed as χ2(1).
It is available in most statistical software.
It can be generalized to comparisons of more than two groups.
For s groups, the test statistic is approximately χ2(s − 1).
The test has high power against alternatives with proportional hazards, but can be weak against nonproportional alternatives.

A.3 Asymptotic theory

A.3.1 Partial likelihood

Here is a very brief summary of the asymptotics concerning the partial likelihood. Once defined, it turns out that you may treat it as an ordinary likelihood function (Andersen et al. 1993). The setup is as follows.

Let t(1), t(2),..., t(k) the ordered observed event times and let Ri = R(t(i)) be the risk set at t(i), i = 1,..., k, see Chapter 2, Equation (2.4). At t(i), condition with respect to the composition of Ri and that one event occurred (for tied events, a correction is necessary).

Then the contribution to the partial likelihood from t(i) is

$\begin{array}{l} L_{i} (β) = P (No . m_{i} dies | one event occur, R_{i}) \\ = \frac{h_{0} (t_{(i)}) \exp (β x_{m_{i}})}{\sum_{ℓ \in R_{i}} h_{0} (t_{(i)}) \exp (β x_{ℓ})} = \frac{\exp (β x_{m_{i}})}{\sum_{ℓ \in R_{i}} \exp (β x_{ℓ})} \end{array}$

and the full partial likelihood is

$L (β) = \prod_{i = 1}^{k} L_{i} (β) = \prod_{i = 1}^{k} \frac{\exp (β x_{m_{i}})}{\sum_{ℓ \in R_{i}} \exp (β x_{ℓ})}$

This is where the doubt about the partial likelihood comes in; the conditional probabilities multiplied together do not have a proper interpretation as a conditional probability. Nevertheless, it is prudent to proceed as if the expression really is a likelihood function. The log partial likelihood becomes

$\begin{array}{l} \log (L (β)) = \sum_{i = 1}^{k} {β x_{m_{i}} - \log (\sum_{ℓ \in R_{i}} \exp (β x ℓ))}, & (A .2) \end{array}$

and the components of the score vector are

$\begin{array}{l} \begin{array}{l} \frac{\partial}{\partial β_{j}} \log L (β) = \sum_{i = 1}^{k} x_{m_{i} j} - \sum_{i = 1}^{k} \frac{\sum_{ℓ \in R_{i}} x_{ℓ j} \exp (β x_{ℓ})}{\sum_{ℓ \in R_{i}} \exp (β x_{ℓ})}, & j = 1, ..., s \end{array} . & (A .3) \end{array}$

The maximum partial likelihood (MPL) estimator of β, $\hat{β}$ , is found by setting (A.3) equal to zero and solve for β.

For inference, we need to calculate the inverse of minus the Hessian, evaluated at $\hat{β}$ . This gives the estimated covariance matrix. The Hessian is the matrix of the second partial derivatives. The expectation of minus the Hessian is called the information matrix. The observed information matrix is

$\hat{I} {(\hat{β})}_{j, m} = - \frac{\partial^{2} \log L (β)}{\partial β_{j} \partial β_{m}} |_{β = \hat{β}}$

and asymptotic theory says that

$\hat{β} ~ N (β, {\hat{I}}^{- 1} (\hat{β}))$

This is to say that $\hat{β}$ is asymptotically unbiased and normally distributed with the given covariance matrix (or the limit of it). Further, $\hat{β}$ is a consistent estimator of β. This is used for testing, confidence intervals, variable selection.

Note that these are only asymptotic results, that is, useful in large to medium sized samples. In small samples, bootstrapping is a possibility. This option is available in the R package eha.

Here a warning is in order: Tests based on standard errors (Wald tests) may be highly unreliable, as in all nonlinear regression (Hauck & Donner 1977). A better alternative is the likelihood ratio test.

A.4 Model Selection

In regression models, there is often several competing models for describing data. In general, there are no strict rules for “correct selection.” However, for nested models, there are some formal guidelines. For a precise definition of this concept, see Appendix A.

A.4.1 Nested Models

The meaning of nesting of models is best described by an example.

Example 31 Two competing models

$ℳ_{2} : h (t; (x_{1}, x_{2})) = h_{0} (t) \exp (β_{1} x_{1} + β_{2} x_{2})$
$ℳ_{1} : h (t; (x_{1}, x_{2})) = h_{0} (t) \exp (β_{1} x_{1}) : x_{2}$ has no effect.

Thus, the model ℳ1 is a special case of ℳ2 (β2 = 0). We say that ℳ1 is nested in ℳ2. Now, assume that ℳ2 is true. Then, testing the hypothesis H0 : ℳ1 is true (as well) is the same as testing the hypothesis H0; β2 = 0.

The formal theory for and procedure for performing the likelihood ratio test (LRT) can be summarized as follows:

Maximize log L(β1, β2) under ℳ2; gives $\log L ({\hat{β}}_{1}, {\hat{β}}_{2})$ .
Maximize log L(β1, β2) under ℳ1, that is, maximize log L(β1, 0); gives log $\log L (β_{1}^{*}, 0)$ .
Calculate the test statistic

$T = 2 (\log L ({\hat{β}}_{1}, {\hat{β}}_{2}) - \log L (β_{1}^{*}, 0))$
Under H0, T has a χ2 (chi-square) distribution with d degrees of freedom: T ~ χ2(d), where d is the difference in numbers of parameters in the two competing models, in this case, 2 − 1 = 1.
Reject H0 if T is large enough. Exactly how much that is depends on the level of significance; if it is α, choose the limit td equal to the 100(1 − α) percentile of the χ2(d) distribution.

This result is a large sample approximations.

The Wald test is theoretically performed as follows:

Maximize log L(β1,β2) under ℳ2; this gives $\log L ({\hat{β}}_{1}, {\hat{β}}_{2})$ , and ${\hat{β}}_{2}$ , $se ({\hat{β}}_{2})$ .
Calculate the test statistic

$T_{W} = \frac{{\hat{β}}_{2}}{se ({\hat{β}}_{2})}$
Under H0, Tw has a standard normal distribution: Tw ~ N(0,1).
Reject H0 if the absolute value of Tw is larger than 1.96 on a significance level of 5%.

This is a large sample approximation, with the advantage that it is automatically available in all software. In comparison to the LRT, one model less has to be fitted. This saves time and efforts, unfortunately on the expense of accuracy, because it may occasionally give nonsensic results. This phenomenon is known as the Hauck-Donner effect (Hauck & Donner 1977).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix A Basic Statistical Concepts

Create new playlist

Sign In

Sign Up