Chapter 18 Sample Size Calculation for Observational Studies

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 18

Sample Size Calculation for Observational Studies

Sin-Ho Jung
Taiyeong Lee
Elizabeth DeLong

Abstract

18.1 Introduction

18.2 Continuous Variables

18.3 Binary Variables

18.4 Two-Sample Log-Rank Test for Survival Data

18.5 Two-Sample Longitudinal Data

18.6 Discussion

Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under H_a

References

Abstract

For a number of reasons, observational studies are currently being used to provide evidence to support medical decisions. Because these studies do not carry the same level of credibility conferred to randomized trials, appropriate attention to statistical issues is paramount. In particular, considerations of event rate, sample size, and power frequently occur. This chapter provides mechanisms for dealing with these considerations under a variety of scenarios. Additionally, the methods presented here are applicable more generally to clinical trials as well.

18.1 Introduction

The increasing availability of computerized clinical patient data has stimulated greater interest in research on observational data. Individual medical practices and hospitals are now attempting to draw inferences on the basis of data gathered on their own patients. Additionally, the proliferation of large registry databases that harvest clinical data from hundreds of sites has created opportunities to study questions regarding rare diseases and rare events. However, even studies using these large databases, when reduced to the appropriate study populations and endpoints, encounter issues of sample size and power.

Observational studies can be designed to address a variety of objectives. If the primary analysis of the study is simply to estimate some parameter, then one can use standard sample size calculations for requiring a confidence interval of at most a certain width. If the objective involves the comparison of cohorts, then in many cases the statistical analysis will involve some form of selection bias adjustment due to the non-randomized nature of observational research. Chapters 1 through 11 cover the issues of selection bias adjustment and standard analytical methods for comparing cohorts in these settings, including propensity score regression, stratification, matching, doubly robust adjustment, instrumental variables, local control, and various longitudinal analyses. Regarding the computation of sample sizes in study planning in these scenarios, such complex designs and analyses may require custom sample size and power estimate calculations through simulations. In some cases, the primary hypothesis may be simplified for the purpose of estimating power and sample size, and generalized SAS macros can perform the needed calculations.

In this chapter, we present a series of SAS macros for computing sample sizes for studies with various types of outcomes and comparisons (all for comparisons between two groups):

• t-test (Section 18.2.1)

• Wilcoxon Rank Sum test (Section 18.2.2)

• Two-Sample Tests on Binary outcomes (Section 18.3.1)

• Weighted Mantel-Haenszel test (Section 18.3.2)

• Log-Rank test for survival data (Section 18.4)

• Longitudinal Data—Continuous outcomes (Section 18.5.3.3)

• Longitudinal Data—Binary outcomes (Section 18.5.3.3)

The sample size macro for the Weighted Mantel-Haenszel test (provided in Section 18.3.1) is directly applicable for observational studies for which the analysis is based on the commonly utilized propensity score stratification approach. This propensity score stratification analysis methodology is described in detail in Chapter 2. In brief, propensity score stratification involves estimating a propensity score for each study participant, stratifying by this score, estimating the difference in outcomes between the two cohorts within each of the strata, and then combining the estimated cohort differences over strata.

Other sections in this chapter present sample size macros based on analysis methods without specific selection bias adjustments. However, these can be very useful for observational study planning when simple cohort comparisons are planned (for example, such as when causal inference is not the objective) or as a very easy-to-use initial calculation when other methods are being utilized such as propensity score matching. In this example, one would supplement the sample sizes computed in these macros with an estimate of the number of patients excluded from the analysis due to a lack of propensity score matched patients in the other cohort. In addition, most of the sample size calculation methods in this chapter can be used for randomized clinical trials as well.

18.2 Continuous Variables

In this section, we consider two two-sample tests for continuous variables: t-test and Wilcoxon rank test.

18.2.1 Two-Sample -test

Suppose that, for group k(=1,2), X_k1,..., X_{k,n_k} are independent and identically distributed (IID) normal random variables with mean μ_k and variance σ². We want to test H₀ : μ₁ = μ₂ vs. Ha:μ1≠μ2 $H_{a} : μ_{1} \neq μ_{2}$ . X¯¯¯k=n−1k∑nki=1Xki ${\bar{X}}_{k} = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} X_{k i}$ Let denote the sample mean for group k, and

s2p=∑n1i=1(X1i−X¯¯¯1)2+∑n2i=1(X2i−X¯¯¯2)2n1+n2−2 $s_{p}^{2} = \frac{\sum_{i = 1}^{n_{1}} {(X_{1 i} - {\bar{X}}_{1})}^{2} + \sum_{i = 1}^{n_{2}} {(X_{2 i} - {\bar{X}}_{2})}^{2}}{n_{1} + n_{2} - 2}$

denote the pooled sample variance. Then, under H₀,

T=X¯¯¯1−X¯¯¯2spn−11+n−12√ $T = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{s_{p} \sqrt{n_{1}^{- 1} + n_{2}^{- 1}}}$

follows the t-distribution with n₁ + n₂ − 2 degrees of freedom. Hence, given type I error probability α, we reject H₀ if ∣∣T∣∣>tn1+n2−2,1−α/2 $| T | > t_{n_{1} + n_{2} - 2, 1 - α / 2}$ , where t_v,γ is the 100γ-th percentile of the t-distribution with v degrees of freedom.

If n₁ and n₂ are large, then we do not require the normal distribution assumption, and the critical value from the t-test, tn1+n2−2,1−α/2 $t_{n_{1} + n_{2} - 2, 1 - α / 2}$ , can be approximated by the 100γ-th percentile of the standard normal distribution, z_1−α/2. How large the sample sizes should be for the large approximation depends on how close the distribution. of the observations for each group is to a normal distribution. To improve the normality, we often apply a transform to the raw data, such as a log-transformation for positive variables (Carroll and Ruppert, 1988). We derive the sample size formula based on the large sample approximation.

We consider a specific alternative hypothesis H_a : |μ₁ − μ₂|= ∆σ. Note that ∆ = σ^-1 (μ₁ − μ₂) denotes a standardized effect size under the alternative hypothesis. Without loss of generality, we assume that μ₁ > μ₂ under H_a. Then, under H_a,

T≈X¯¯¯1−X¯¯¯2−σΔσn−11+n−12√+Δn−11+n−12√ $T \approx \frac{{\bar{X}}_{1} - {\bar{X}}_{2} - σ Δ}{σ \sqrt{n_{1}^{- 1} + n_{2}^{- 1}}} + \frac{Δ}{\sqrt{n_{1}^{- 1} + n_{2}^{- 1}}}$ =Z+Δnr1r2−−−−−√, $= Z + Δ \sqrt{n r_{1} r_{2}},$

where Z : N(0, 1), r_k = n_k/n, and n = n₁ + n₂. For a given n, the power is calculated as

1−β=ϕ¯(z1−α/2−Δnr1r2−−−−−√), $1 - β = \bar{ϕ} (z_{1 - α / 2} - Δ \sqrt{n r_{1} r_{2}}),$ (1)

where ϕ¯(z)=1−ϕ(z) $\bar{ϕ} (z) = 1 - ϕ (z)$ and ϕ(z)=P(Z≤z) $ϕ (z) = P (Z \leq z)$ is the cumulative distribution function of the standard normal distribution, N(0, 1). By solving (1) with respect to n, we obtain the required total sample size as

n=(z1−α/2+z1−β)2r1r2Δ2, $n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2}}{r_{1} r_{2} Δ^{2}},$

or n_k = n х r_k for group k. For one-sided testing, replace α/2 with α. This sample size is based on a large sample approximation to a t-distribution, so that this formula underestimates the sample size for small sample sizes. If the final sample size by the formula based on normal approximation is smaller than 30, we may consider increasing it by about 10%.

In summary, the sample size calculation for a two-sample t-test is conducted as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) Δ=σ−1∣∣μ1−μ2∣∣ $Δ = σ^{- 1} | μ_{1} - μ_{2} |$ , standardized effect size

(b) the prevalence of group k (r1+r2=1) $(r_{1} + r_{2} = 1)$

• Calculate the sample size

n=(z1−α/2+z1−β)2r1r2Δ2. $n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2}}{r_{1} r_{2} Δ^{2}} .$

Example 18.1

Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α= 0.05 and power 1 − β = 0.9. Then z1−α/2=1.96 $z_{1 - α / 2} = 1.96$ and z1−β=1.282 $z_{1 - β} = 1.282$ . Assuming an equal proportion of two groups in the population (that is r1=r2=0.5 $r_{1} = r_{2} = 0.5$ ), we obtain the required sample size . PROC POWER in SAS/STAT can directly handle this scenario, as shown in the following code:

Program 18.1 SAS Code for Two-Sample t-Test

proc power;
     twosamplemeans
     nfractional
     meandiff = 0.5              /* mean difference                */
     stddev = 1                 /* sigma                          */
     groupweights = (0.5 0.5)   /* r1 r2                          */
     sides = 2                   /* 1: one sided 2: 2: two-sided   */
     power =0.9                  /* power                          */
     alpha =0.05                 /* alpha                          */
     ntotal=.;                   /* Sample Size                    */
run;

Output from Program 18.1

The POWER Procedure
Two-sample t Test for Mean Difference

Fixed Scenario Elements

Distribution	Normal
Method	Exact
Number of Sides	2
Alpha	0.05
Mean Difference	0.5
Standard Deviation	1
Group 1 Weight	0.5
Group 2 Weight	0.5
Nominal Power	0.9
Null Difference	0

Computed Ceiling N Total

Fractional N Total	Actual Power	Ceiling N Total
170.062568	0.902	171

18.2.2 Wilcoxon Rank Sum Test

It is known that -tests are sensitive to outliers. The Wilcoxon rank sum test (WRST) has been widely used as a robust test. We assume that for group k(=1,2) $k (= 1, 2)$ , Xk1,...,Xk,nk $X_{k 1}, ..., X_{k, n_{k}}$ are IID random variables from a distribution with cumulative distribution function F(x−θk)= P (Xki≤x) $F (x - θ_{k}) = P (X_{k i} \leq x)$ . For Δ=θ1−θ2 $Δ = θ_{1} - θ_{2}$ , we want to test H0:Δ=0 $H_{0} : Δ = 0$ vs. Ha:Δ≠0 $H_{a} : Δ \neq 0$ . Mann and Whitney (1947) propose to use W=(n1n2)−1∑n1i=1∑n2j=1I(X1i>X2j) $W = (n_{1} n_{2})^{- 1} \sum_{i = 1}^{n_{1}} \sum_{j = 1}^{n_{2}} I (X_{1 i} > X_{2 j})$ for testing the hypothesis. The expected value of W is P (X1i>X2j) $P (X_{1 i} > X_{2 j})$ , the probability that a randomly chosen measurement from group 1 is greater than a randomly chosen measurement from group 2. Thus, the test statistic will be close to 1/2 if H₀ is true, and closer to 0 or 1 if H_a is true.

For large n(=n₁ + n₂), we reject if the absolute value of

T=W−ν0σ0 $T = \frac{W - ν_{0}}{σ_{0}}$

is larger than z_1−α/2, where v₀ = 1/2 and σ20=(n+1)/(12n1n2) $σ_{0}^{2} = (n + 1) / (12 n_{1} n_{2})$ are the mean and variance of W under H₀, respectively.

Let f(x)=∂F/∂x $f (x) = \partial F / \partial x$ denote the probability density function of F(x). The appendix shows that, under H_a, W has mean

νa=∫∞−∞F(x+Δ)f(x)dx $ν_{a} = \int_{- \infty}^{\infty} F (x + Δ) f (x) d x$

and variance

σ2a=σ21n1+σ22n2, $σ_{a}^{2} = \frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}},$

where

σ21=∫∞−∞F2(x+Δ)f(x)dx−{∫∞−∞F(x+Δ)f(x)dx}2, $σ_{1}^{2} = \int_{- \infty}^{\infty} F^{2} (x + Δ) f (x) d x - {\int_{- \infty}^{\infty} F (x + Δ) f (x) d x}^{2},$

and

σ22=∫∞−∞F2(x−Δ)f(x)dx−{∫∞−∞F(x−Δ)f(x)dx}2. $σ_{2}^{2} = \int_{- \infty}^{\infty} F^{2} (x - Δ) f (x) d x - {\int_{- \infty}^{\infty} F (x - Δ) f (x) d x}^{2} .$

Then,

T=Zσaσ0+νa−ν0σ0, $T = Z \frac{σ_{a}}{σ_{0}} + \frac{ν_{a} - ν_{0}}{σ_{0}},$

where Z=(W−νa)/σa $Z = (W - ν_{a}) / σ_{a}$ is an asymptotically random variable. Given power 1 − β,

1−β= P (∣∣T∣∣>z1−α/2∣∣Ha)=P(Zσaσ0+|νa−ν0|σ0>z1−α/2). $1 - β = P (| T | > z_{1 - α / 2} | H_{a}) = P (Z \frac{σ_{a}}{σ_{0}} + \frac{| ν_{a} - ν_{0} |}{σ_{0}} > z_{1 - α / 2}).$

=ϕ¯(σ0σa(z1−α/2−|νa−ν0|σ0)), $= \bar{ϕ} (\frac{σ_{0}}{σ_{a}} (z_{1 - α / 2} - \frac{| ν_{a} - ν_{0} |}{σ_{0}})),$

where ϕ¯(z)= P (Z>z) $\bar{ϕ} (z) = P (Z > z)$ . By replacing n_k with nr_k in σ20 $σ_{0}^{2}$ and σ2a $σ_{a}^{2}$ , and solving with respect to n, we obtain the required sample size

n=112r1r2{z1−α/2+z1−β12(r2σ21+r1σ22)√νa−1/2}2. $n = \frac{1}{12 r_{1} r_{2}} {\frac{z_{1 - α / 2} + z_{1 - β} \sqrt{12 (r_{2} σ_{1}^{2} + r_{1} σ_{2}^{2})}}{ν_{a} - 1 / 2}}^{2} .$

Although the testing does not require the specification of the distribution function F(x), a sample size calculation does. The alternative hypothesis is specified by two distributions with a location shift. The assumption of a location shift model is not required in order to carry out the rank sum test, but it is needed if we want to make inferences about medians or means. It is, of course, needed in the sample size calculation as presented here.

While a t-test requires a finite variance of the observations, WRST does not. For example, if has a Cauchy distribution, we can use WRST, but not a t-test.

A sample size calculation for WRST assuming normal distributions can be described as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) Δ=σ−1(μ1−μ2) $Δ = σ^{- 1} (μ_{1} - μ_{2})$ , standardized effect size

(b) r_k = the prevalence of group k (r1+r2=1) $(r_{1} + r_{2} = 1)$

• Calculate

νa=∫∞−∞ϕ(x+Δ)ϕ(x)dx $ν_{a} = \int_{- \infty}^{\infty} ϕ (x + Δ) ϕ (x) d x$

σ21=∫∞−∞ϕ2(x+Δ)ϕ(x)dx−{∫∞−∞ϕ(x+Δ)ϕ(x)dx}2 $σ_{1}^{2} = \int_{- \infty}^{\infty} ϕ^{2} (x + Δ) ϕ (x) d x - {\int_{- \infty}^{\infty} ϕ (x + Δ) ϕ (x) d x}^{2}$

and

σ22=∫∞−∞ϕ2(x−Δ)ϕ(x)dx−{∫∞−∞ϕ(x−Δ)ϕ(x)dx}2, $σ_{2}^{2} = \int_{- \infty}^{\infty} ϕ^{2} (x - Δ) ϕ (x) d x - {\int_{- \infty}^{\infty} ϕ (x - Δ) ϕ (x) d x}^{2},$

where ϕ(x) $ϕ (x)$ and ϕ(x) $ϕ (x)$ are the probability density function and the cumulative distribution function of , respectively.

• Calculate the sample size

n=112r1r2{z1−α/2+z1−β12(r2σ21+r1σ22)√νa−1/2}2. $n = \frac{1}{12 r_{1} r_{2}} {\frac{z_{1 - α / 2} + z_{1 - β} \sqrt{12 (r_{2} σ_{1}^{2} + r_{1} σ_{2}^{2})}}{ν_{a} - 1 / 2}}^{2} .$

Example 18.2

Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α = 0.05 and power 1 − β = 0.9. Then z1−α/2=1.96 $z_{1 - α / 2} = 1.96$ and z1−β=1.282 $z_{1 - β} = 1.282$ . Assuming an equal proportion of two groups in the population (that is, r1=r2=0.5 $r_{1} = r_{2} = 0.5$ ) , we obtain the required sample size n = 177. Note that the two-sample t-test requires a slightly smaller n = 171, but, in case there are outliers in the final data, the Wilcoxon rank test may be more powerful.

Program 18.2 SAS Code for Wilcoxon Rank Sum Test

%macro WilcoxonRankSumTest( delta=,    /* Standardized effect size     */
                             r1 = ,    /* prevalence of group 1        */
                             r2 = ,    /* prevalence of group 2        */
                             sides=,    /* number of test sides: 1 or 2 */
                             alpha=,    /* alpha                        */
                             power=     /* power                        */
                             );
proc iml;
start CDFPDFDELTA(t);
      m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start CDF2PDFDELTA(t);
      m=t+δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;

start CDFPDF_DELTA(t);
      m=t-δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start CDF2PDF_DELTA(t);
      m=t-δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start s1(t);
      m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;

interval = .M || .P;
call quad(nu1,"CDFPDFDELTA", interval);
call quad(s1, "CDF2PDFDELTA", interval);
call quad(nu2,"CDFPDF_DELTA", interval);
call quad(s2, "CDF2PDF_DELTA", interval);

sigma1 = s1-nu1**2; sigma2 = s2-nu2**2;
p1=1-&alpha/&sides; z_alpha = probit(p1);
p2=&power; z_beta = probit(p2);
n =
(1/(12*&r1*&r2))*((z_alpha+z_beta*sqrt(12*(&r2*sigma1+&r1*sigma2)))/(nu1-
0.5))**2;
delta=δr1 =&r1; r2=&r2;sides =&sides;alpha=αpower=&power;
print 'Sample Size';
print 'Wilcoxon Rank Sum Test for Mean Difference';
print delta r1 r2 sides alpha power;
print n;
quit;
run;
%mend WilcoxonRankSumTest;

/*---------------------- Run the macro for Example 2. -------------------*/

%WilcoxonRankSumTest(delta=0.5, r1 = 0.5, r2 = 0.5, sides=2, alpha=0.05, power=0.9);

Output from Program 18.2

Sample Size
Wilcoxon Rank Sum Test for Mean Difference

delta	r1	r2	sides	alpha	power
0.5	0.5	0.5	2	0.05	0.9

n
176.41709

18.3 Binary Variables

In this section, we investigate two-sample tests for binary outcome variables with or without stratification. We use a weighted Mantel-Haenszel test for stratified analysis to adjust for covariates that are unbalanced between two groups.

18.3.1 Two-Sample Test on a Binary Outcome

Suppose that, for group k(=1, 2), X_k denotes the number of responders from n_k independent subjects. If group k has a response probability p_k, then X_k is a binomial distribution with n_k independent trials and response probability p_k. We want to test H0:p1=p2 $H_{0} : p_{1} = p_{2}$ vs. Ha:p1≠p2 $H_{a} : p_{1} \neq p_{2}$ . Let pˆk=Xk/nk ${\hat{p}}_{k} = X_{k} / n_{k}$ and pˆ=(X1+X2)/(n1+n2) $\hat{p} = (X_{1} + X_{2}) / (n_{1} + n_{2})$ denote the sample proportion for group k and the pooled data, respectively. Then, for large n₁ and n₂ (say, n₁,n₂ > 30) ,

T=pˆ1−pˆ2pˆqˆ(n−11+n−12)√ $T = \frac{{\hat{p}}_{1} - {\hat{p}}_{2}}{\sqrt{\hat{p} \hat{q} (n_{1}^{- 1} + n_{2}^{- 1})}}$

follows the standard normal distribution under H₀, where qˆ=1−pˆ $\hat{q} = 1 - \hat{p}$ . Hence, given type I error probability α, we reject H₀ if ∣∣T∣∣>z1−α/2 $| T | > z_{1 - α / 2}$ . It is easy to show that T² is identical to the chi-squared test with 1 degree of freedom for a 2 х 2 table.

Let p₁, p₂ denote the response probability under a specific alternative hypothesis for sample size calculation, and Δ=p1−p2 $Δ = p_{1} - p_{2}$ . Then, for large n(=n1+n2) $n (= n_{1} + n_{2})$ , pˆ $\hat{p}$ converges to p¯=r1p1+r2p2 $\bar{p} = r_{1} p_{1} + r_{2} p_{2}$ , where rk=nk/n $r_{k} = n_{k} / n$ . Under H_a,

T≈pˆ1−pˆ2−Δp¯q¯(n−11+n−12)√+Δp¯q¯(n−11+n−12)√ $T \approx \frac{{\hat{p}}_{1} - {\hat{p}}_{2} - Δ}{\sqrt{\bar{p} \bar{q} (n_{1}^{- 1} + n_{2}^{- 1})}} + \frac{Δ}{\sqrt{\bar{p} \bar{q} (n_{1}^{- 1} + n_{2}^{- 1})}}$

=Z+Δnr1r2p¯q¯−−−−√, $= Z + Δ \sqrt{\frac{n r_{1} r_{2}}{\bar{p} \bar{q}}},$

where Z : Z(0, 1) and q¯=1−p¯ $\bar{q} = 1 - \bar{p}$ . Hence, given n, the power is calculated as

1−β=ϕ¯(z1−α/2−Δnr1r2−−−−−√), $1 - β = \bar{ϕ} (z_{1 - α / 2} - Δ \sqrt{n r_{1} r_{2}}),$ (2)

By solving (2) with respect to n, we obtain the required total sample size as

n=(z1−α/2+z1−β)2r1r2Δ2, $n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2}}{r_{1} r_{2} Δ^{2}},$

In summary, the sample size calculation for two-sample binomial proportions is conducted as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) p₁, p₂ = binomial proportions under H_a

(b) r_k the prevalence of group k (r₁ + r₂ = 1)

• Calculate the sample size

n=(z1−α/2+z1−β)2p¯q¯r1r2(p1−p2)2, $n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2} \bar{p} \bar{q}}{r_{1} r_{2} {(p_{1} - p_{2})}^{2}},$

where p¯=r1p1+r2p2 $\bar{p} = r_{1} p_{1} + r_{2} p_{2}$ and q¯=1−p¯ $\bar{q} = 1 - \bar{p}$ .

Example 18.3

Suppose that we want to detect a difference of p₁ = 0.4 and p₂ = 0.5 with two-sided α = 0.05 and power 1 − β = 0.9. Assuming an equal proportion of two groups in the population (that is, r₁ = r₂ = 0.5), we obtain the required sample size n = 10.38.

Program 18.3 SAS Code for Two-Sample Binomial Proportions

proc power;
     twosamplefreq test=pchi
     groupproportions =(0.4 0.5) /* binomial proportions to be detected */
     groupweights = (1 1)        /* (w1, w2):Weight of two groups,(r1,r2)*/
     sides = 2                   /* 1: one sided 2: 2: two-sided        */
     power =0.9                  /* power                                */
     alpha =0.05                 /* alpha                                */
     ntotal=.;                   /* sample size                         */
    run;

Output from Program 18.3

The POWER Procedure
Pearson Chi-square Test for Two Proportions

Fixed Scenario Elements

Distribution	Asymptotic normal
Method	Normal approximation
Number of Sides	2
Alpha	0.05
Group 1 Proportion	0.4
Group 2 Proportion	0.5
Group 1 Weight	1
Group 2 Weight	1
Nominal Power	0.9
Null Proportion Difference	0

Computed N Total

Actual Power	N Total
0.901	1038

18.3.2 Weighted Mantel-Haenszel Test

In an observational study, patients are not randomly assigned to treatment groups with equal probability. Instead, the probability of assignment varies from patient to patient, possibly depending on the patient’s baseline covariates. This often results in non-comparable treatment groups due to imbalance of the baseline covariates and consequently invalidates the standard methods commonly employed in data analysis. To overcome this problem, Rosenbaum and Rubin (1983, 1984) developed the propensity score method.

Suppose that there are j strata formed by propensity score stratification. Let n denote the total sample size and n_j the sample size in stratum j ∑Jj=1nj=n $\sum_{j = 1}^{J} n_{j} = n$ . The data on each subject comprise the response variable x = 1 for response and 0 for no response and j and k for the stratum and treatment group, respectively, to which the subject is assigned (1≤j≤J;k=1,2 $1 \leq j \leq J; k = 1, 2$ ). Frequency data in stratum j can be described as follows:

	Group
Response	1	2	Total
Yes	x_j11	x_j12	x_j1
No	x_j21	x_j22	x_j2
Total	n_j1	n_j2	n_j

Let Oj=xj11 $O_{j} = x_{j 11}$ , Ej=nj1xj1/nj $E_{j} = n_{j 1} x_{j 1} / n_{j}$ , and

Vj=nj1nj2xj1xj2n2j(nj−1). $V_{j} = \frac{n_{j 1} n_{j 2} x_{j 1} x_{j 2}}{n_{j}^{2} (n_{j} - 1)} .$

Then, the weighted Mantel-Haenszel (WMH) test is given by

T=∑Jj=1wˆj(Oj−Ej)∑Jj=1wˆ2jVj√, $T = \frac{\sum_{j = 1}^{J} {\hat{w}}_{j} (O_{j} - E_{j})}{\sqrt{\sum_{j = 1}^{J} {\hat{w}}_{j}^{2} V_{j}}},$

where the weights wˆj ${\hat{w}}_{j}$ converge to a constant w_j as n→∞ $n \to \infty$ . The weights are wˆj=1 ${\hat{w}}_{j} = 1$ for the original Mantel-Haenszel test and wˆj=qˆj=xj2/nj ${\hat{w}}_{j} = {\hat{q}}_{j} = x_{j 2} / n_{j}$ for the statistic proposed by Gart (1985).

Let a_j = n_j/n denote the allocation proportion for stratum (∑Jj=1aj=1) $(\sum_{j = 1}^{J} a_{j} = 1)$ , and bjk=njk/nj $b_{j k} = n_{j k} / n_{j}$ denote the allocation proportion for group k within stratum j (bj1+bj2=1) $(b_{j 1} + b_{j 2} = 1)$ . Let p_jk denote the response probability for group k in stratum j and qjk=1−pjk $q_{j k} = 1 - p_{j k}$ . Under H0:pj1=pj2,1≤j≤J $H_{0} : p_{j 1} = p_{j 2}, 1 \leq j \leq J$ , T is approximately N(0, 1). The optimal weights maximizing the power depend on the allocation proportions {(aj,b1j,b2j),j=1,...,J} ${(a_{j}, b_{1 j}, b_{2 j}), j = 1,..., J}$ and effect sizes (pj1−pj2,1,...,J) $(p_{j 1} - p_{j 2}, 1,..., J)$ under H₁.

In order to calculate the power of WMH, we have to derive the asymptotic distribution of ∑Jj=1wˆj(Oj−Ej) $\sum_{j = 1}^{J} {\hat{w}}_{j} (O_{j} - E_{j})$ and the limit of ∑Jj=1wˆ2jVj $\sum_{j = 1}^{J} {\hat{w}}_{j}^{2} V_{j}$ under H₁. We assume that the success probabilities satisfy for under . Note that (pjk,1≤j≤J,j=1,2) $(p_{j k}, 1 \leq j \leq J, j = 1, 2)$ satisfy pj2qj1/(pj1qj2)=ϕ $p_{j 2} q_{j 1} / (p_{j 1} q_{j 2}) = ϕ$ for ϕ≠1 $ϕ \neq 1$ under H₁. Note that a constant odds ratio across strata holds if there exists no interaction between the treatment and the propensity score when the binary response is regressed on the treatment indicator and the propensity score using a logistic regression. The following derivations are based on H₁. It can be verified that

Oj−Ej=nj1nj2nj(pˆj1−pˆj2) $O_{j} - E_{j} = \frac{n_{j 1} n_{j 2}}{n_{j}} ({\hat{p}}_{j 1} - {\hat{p}}_{j 2})$

=nj1nj2nj(pˆj1−pj1−pˆj2+pj2)+nj1nj2nj(pj1−pj2) $= \frac{n_{j 1} n_{j 2}}{n_{j}} ({\hat{p}}_{j 1} - p_{j 1} - {\hat{p}}_{j 2} + p_{j 2}) + \frac{n_{j 1} n_{j 2}}{n_{j}} (p_{j 1} - p_{j 2})$

=najbj1bj2(pˆj1−pj1−pˆj2+pj2)+najbj1bj2(pj1−pj2). $= n a_{j} b_{j 1} b_{j 2} ({\hat{p}}_{j 1} - p_{j 1} - {\hat{p}}_{j 2} + p_{j 2}) + n a_{j} b_{j 1} b_{j 2} (p_{j 1} - p_{j 2}).$

Thus, under H₁, ∑Jj=1wˆj(Oj−Ej) $\sum_{j = 1}^{J} {\hat{w}}_{j} (O_{j} - E_{j})$ is approximately normal with mean nΔ and variance nσ21 $n σ_{1}^{2}$ , where

Δ=∑Jj=1wjajbj1bj2(pj1−pj2) $Δ = \sum_{j = 1}^{J} w_{j} a_{j} b_{j 1} b_{j 2} (p_{j 1} - p_{j 2})$

=(1−Φ)∑Jj=1wjajbj1bj2pj1qj1qj1+Φpj1 $= (1 - Φ) \sum_{j = 1}^{J} w_{j} a_{j} b_{j 1} b_{j 2} \frac{p_{j 1} q_{j 1}}{q_{j 1} + Φ p_{j 1}}$

and

σ21=n−1∑Jj=1w2jn2j1n2j2n2j(pj1qj1nj1+pj2qj2nj2) $σ_{1}^{2} = n^{- 1} \sum_{j = 1}^{J} w_{j}^{2} \frac{n_{j 1}^{2} n_{j 2}^{2}}{n_{j}^{2}} (\frac{p_{j 1} q_{j 1}}{n_{j 1}} + \frac{p_{j 2} q_{j 2}}{n_{j 2}})$

=∑Jj=1w2jajbj1bj2(bj2pj1qj1+bj1pj2qj2). $= \sum_{j = 1}^{J} w_{j}^{2} a_{j} b_{j 1} b_{j 2} (b_{j 2} p_{j 1} q_{j 1} + b_{j 1} p_{j 2} q_{j 2}).$

Also under H₁, we have

\sum j = 1 J w 2 j V j = n σ 20 + o p (n),

$\sum_{j = 1}^{J} w_{j}^{2} V_{j} = n σ_{0}^{2} + o_{p} (n),$

where

σ 20 = \sum j = 1 J w 2 j a j b j 1 b j 2 (b j 1 p j 1 + b j 2 p j 2) (b j 1 q j 1 + b j 2 q j 2).

$σ_{0}^{2} = \sum_{j = 1}^{J} w_{j}^{2} a_{j} b_{j 1} b_{j 2} (b_{j 1} p_{j 1} + b_{j 2} p_{j 2}) (b_{j 1} q_{j 1} + b_{j 2} q_{j 2}).$

Hence, the power of WMH is given as

1 - β = P (∣ ∣ T ∣ ∣ > z 1 - α / 2 ∣ ∣ H 1)

$1 - β = P (| T | > z_{1 - α / 2} | H_{1})$

= P (σ 1 σ 0 Z + n - - \sqrt | Δ | σ 0 > z 1 - α / 2)

$= P (\frac{σ_{1}}{σ_{0}} Z + \sqrt{n} \frac{| Δ |}{σ_{0}} > z_{1 - α / 2})$

= Φ ¯ ¯ (σ 0 σ 1 z 1 - α / 2 - n - - \sqrt | Δ | σ 1),

$= \bar{Φ} (\frac{σ_{0}}{σ_{1}} z_{1 - α / 2} - \sqrt{n} \frac{| Δ |}{σ_{1}}),$

where Z is a standard normal random variable and Φ¯¯(z)=P(Z>z) $\bar{Φ} (z) = P (Z > z)$ . Thus, the sample size required for achieving a desired power of 1 − β can be obtained as

n = ( σ 0 z 1 - α / 2 + σ 1 z 1 - β ) 2 Δ 2 .

$n = \frac{{(σ_{0} z_{1 - α / 2} + σ_{1} z_{1 - β})}^{2}}{Δ^{2}} .$ (3)

Following Jung, Chow, and Chi (2007), the sample size calculation for the weighted Mantel-Haenszel test can be carried out as follows:

1. Specify the input variables

- Type I and II error probabilities (α β).

- Success probabilities for group 1 p₁₁,...p_J1, and the odds ratio Φ under H₁. Note that pj2=Φpj1/(qj1+Φpj1) $p_{j 2} = Φ p_{j 1} / (q_{j 1} + Φ p_{j 1})$ .

- Incidence rates for the strata, (a_j, j = 1,...,J). (Yue [2007] proposes to use aj≈1/J $a_{j} \approx 1 / J$ .

- Allocation probability for group 1 within each stratum, (bj1,j=1,...,J) $(b_{j 1}, j = 1,..., J)$ .

2. Calculate n by

n = ( σ 0 z 1 - α / 2 + σ 1 z 1 - β ) 2 Δ 2,

$n = \frac{{(σ_{0} z_{1 - α / 2} + σ_{1} z_{1 - β})}^{2}}{Δ^{2}},$

where

Δ=∑Jj=1ajbj1bj2(pj1−pj2) $Δ = \sum_{j = 1}^{J} a_{j} b_{j 1} b_{j 2} (p_{j 1} - p_{j 2})$

σ21=∑Jj=1ajbj1bj2(bj2pj1qj1+bj1pj2qj2) $σ_{1}^{2} = \sum_{j = 1}^{J} a_{j} b_{j 1} b_{j 2} (b_{j 2} p_{j 1} q_{j 1} + b_{j 1} p_{j 2} q_{j 2})$

σ20=∑Jj=1ajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2). $σ_{0}^{2} = \sum_{j = 1}^{J} a_{j} b_{j 1} b_{j 2} (b_{j 1} p_{j 1} + b_{j 2} p_{j 2}) (b_{j 1} q_{j 1} + b_{j 2} q_{j 2}).$

Example 18.4

Suppose that we want to compare the response probabilities between control (k = 1) (and experimental (k = 2) (groups. We consider partitioning the combined data into J = 5 strata, and the allocation proportions are projected as and (a1,a2,a3,a4,a5)=(.15,.15,.2,.25,.25) $(a_{1}, a_{2}, a_{3}, a_{4}, a_{5}) = (.15,.15, .2, .25, .25)$ and (b11,b21,b31,b41,b51)=(.4,.4,.5,.6,.6) $(b_{11}, b_{21}, b_{31}, b_{41}, b_{51}) = (.4, .4, .5, .6, .6)$ . Also, suppose that the response probabilities for the control group are given as (p11,p21,p31,p41,p51)=(.5,.6,.7,.8,.9) $(p_{11}, p_{21}, p_{31}, p_{41}, p_{51}) = (.5, .6, .7, .8, .9)$ , and we want to calculate the sample size required for a power of 1 − β = 0.8 to detect an odds ratio of Φ=2 $Φ = 2$ using two-sided α = 0.05. For Φ=2 $Φ = 2$ , the success probabilities for the experimental group are given as (p12,p22,p32,p42,p52)=(.6667,.7500,.8235,.8889,.9474) $(p_{12}, p_{22}, p_{32}, p_{42}, p_{52}) = (.6667, .7500, .8235, .8889, .9474)$ . Under these settings, by (3), we need n = 447 for Mantel-Haenszel.

Program 18.4 SAS Code for Weighted Mantel-Haenszel Test with Strata

%macro WMHTestwithStrata ( J = , /* number of strata                         */
                          inA= , /* incidence rates for the strata           */
                          inB= , /* allocation probability for control group */
                          inP1=, /* success probability for control group    */
                          phi =, /* odds ratio under H1                      */
                          power=, /* power                                   */
                          alpha=, /* alpha                                   */
                          sides=  /* 1: one-sided test 2: Two-sided test     */
                         );

proc iml;
   %let K = 2; /* two groups */
   A=&inA; B =&inB;P1=&inP1;
   P2 =J(&J,1,0);Q1 =J(&J,1,0); Q2 =J(&J,1,0);
   do j=1 to &J;
      Q1[j]=1-P1[j];
      P2[j]=&phi*P1[j]/(Q1[j]+&phi*P1[j]);
      Q2[j]=1-P2[j];
   end;
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
   z_p2=&power; z_beta = probit(z_p2);
   delta = 0; s0_sq= 0; s1_sq= 0;

   do j=1 to &J;
    delta = delta+A[j]*B[j]*(1-B[j])*(P1[j]-P2[j]);
    s1_sq= s1_sq+A[j]*B[j]*(1-B[j])*((1-B[j])*P1[j]*Q1[j]+B[j]*P2[j]*Q2[j]);
    s0_sq= s0_sq+A[j]*B[j]*(1-B[j])*(B[j]*P1[j]+(1-B[j])*P2[j])*(B[j]*Q1[j]+(1-B[j])*Q2[j]);
   end;
   n = (1/(delta**2))*((sqrt(s0_sq)*z_alpha+sqrt(s1_sq)*z_beta)**2);

   print 'Sample Size Calculation';
   print 'Weighted Mantel-Haenszel Test with Strata';
   alpha = α power = &power; phi= φ sides = &Sides;
   print  alpha power phi sides ;
   print A B P1 P2;
   print delta s0_sq s1_sq;
   print n;
  quit;
  run;
%mend WMHTestwithStrata;

/*---------------- Run the macro for Example 18.4. ---------------------*/

%WMHTestwithStrata(
   J = 5 ,                                  /* number of strata        */
   inA= %str({0.15, 0.15, 0.2, 0.25, 0.25}), /*Incidence Rates for Strata*/
   inB= %str({0.4 , 0.4, 0.5, 0.6, 0.6}), /* Allocation Probability  */
   inP1=%str({0.5 , 0.6, 0.7, 0.8, 0.9}), /* Success Probability     */
   phi = 2,                                 /* odds Ratio Under H1     */
   power =0.8,                              /* Power                   */
   alpha = 0.05,                            /* Alpha                   */
   sides = 2                                /* Two-sided test          */
  );

Output from Program 18.4

Sample Size Calculation

Weighted Mantel-Haenszel Test with Strata

alpha 0.05	power 0.8	phi 2	sides 2

A	B	P1	P2
0.15	0.4	0.5	0.6666667
0.15	0.4	0.6	0.75
0.2	0.5	0.7	0.8235294
0.25	0.6	0.8	0.8888889
0.25	0.6	0.9	0.9473684

delta	s0_sq	s1_sq
-0.025752	0.0381275	0.0367178

446.21501

18.4 Two-Sample Log-Rank Test for Survival Data

Suppose that n_k subjects are accrued to a study from group k(=1,2). For subject i(=1,...,n_k) in group k(=1,2), let T_ki denote the survival variable (that is, time to an event of interest), with marginal cumulative hazard function Λk(t) $Λ_{k} (t)$ . We want to test the null hypothesis,

H0:Λ1(t)=Λ2(t) for all t≥0 $H_{0} : Λ_{1} (t) = Λ_{2} (t) f o r a l l t \geq 0$

against the alternative hypothesis,

Ha:Λ1(t)≠Λ2(t) for some t≥0. $H_{a} : Λ_{1} (t) \neq Λ_{2} (t) f o r s o m e t \geq 0.$

Because a cumulative hazard function uniquely determines the distribution, H₀ implies that (T1i,i=1,...,n1) $(T_{1 i}, i = 1,..., n_{1})$ and (T2i,i=1,...,n2) $(T_{2 i}, i = 1,..., n_{2})$ have the same distributions.

Due to loss to follow up or termination of study before all subjects experience events, survival times are censored for some subjects. Let C_ki denote the censoring time for the subject i in group k. Then, we observe {(Xki,δki),1≤i≤nk,k=1,2} ${(X_{k i}, δ_{k i}), 1 \leq i \leq n_{k}, k = 1, 2}$ , where Xki=min(Tki,Cki) $X_{k i} = m i n (T_{k i}, C_{k i})$ and δki=I(Tki≤Cki) $δ_{k i} = I (T_{k i} \leq C_{k i})$ . We assume that C_ki and T_1i are independent within each group.

The log-rank test (Peto and Peto, 1972) is given by

W=∫∞0Y1(t)Y2(t)Y1(t)+Y2(t){dΛˆ1(t)−dΛˆ2(t)}, $W = \int_{0}^{\infty} \frac{Y_{1} (t) Y_{2} (t)}{Y_{1} (t) + Y_{2} (t)} {d {\hat{Λ}}_{1} (t) - d {\hat{Λ}}_{2} (t)},$

where Λˆk(t)=∫t0Yk(s)−1dNk(s) ${\hat{Λ}}_{k} (t) = \int_{0}^{t} Y_{k} {(s)}^{- 1} d N_{k} (s)$ is the Nelson-Aalen estimator (Nelson, 1969; Aalen, 1978) of Λk(t) $Λ_{k} (t)$ , Nk(t)=∑nki=1Nki(t) $N_{k} (t) = \sum_{i = 1}^{n_{k}} N_{k i} (t)$ , Nki(t)=δkiI(Xki≤t) $N_{k i} (t) = δ_{k i} I (X_{k i} \leq t)$ , Yk(t)=∑nki=1Yki(t) $Y_{k} (t) = \sum_{i = 1}^{n_{k}} Y_{k i} (t)$ , and Yki(t)=I(Xki≥t) $Y_{k i} (t) = I (X_{k i} \geq t)$ . For large n with and nk/n=rk $n_{k} / n = r_{k}$ , W/σˆ $W / \hat{σ}$ is approximately normal with mean 0 and variance 1 under H₀, where

σˆ2=∫∞0Y1(t)Y2(t){Y1(t)+Y2(t)}{dN1(t)+dN2(t)}, ${\hat{σ}}^{2} = \int_{0}^{\infty} \frac{Y_{1} (t) Y_{2} (t)}{{Y_{1} (t) + Y_{2} (t)}} {d N_{1} (t) + d N_{2} (t)},$

Refer to Fleming and Harrington (1991), for example. We reject H₀ if the absolute value of W/Σˆ $W / \hat{Σ}$ is larger than z1−α/2 $z_{1 - α / 2}$ .

For sample size and power calculation, we assume a proportional hazards model Δ=λ2(t)/λ1(t) $Δ = λ_{2} (t) / λ_{1} (t)$ . The power of the log-rank test depends on the number of events, rather than the number of patients. By Rubinstein and colleagues (1981), the number of events D₁ and D₂ in the two groups, required for a two-sided α test to detect a hazard ratio of Δ(<1) with power 1 − β, is given as

(logΔz1−α/2+z1−β)2=D−11+D−12. ${(\frac{l o g Δ}{z_{1 - α / 2} + z_{1 - β}})}^{2} = D_{1}^{- 1} + D_{2}^{- 1} .$ (4)

In order to calculate D_k, we need to specify the survival distributions under H_a and the common censoring distribution. Suppose that the subjects are accrued at a constant rate for a period of A and followed for an additional period of B after the completion of accrual. The total study period is A + B. Then, assuming no loss to follow up, the censoring distribution is uniform between B and A + B, as in the following:

G(t)= P (Cki≥t)=⎧⎩⎨⎪⎪11+(B−t)/A0 if t<B if B≤t≤A+B otherwise. $G (t) = P (C_{k i} \geq t) = {\begin{array}{l} 1 & i f t < B \\ 1 + (B - t) / A & i f B \leq t \leq A + B \\ 0 & o t h e r w i s e . \end{array}$

One of the most popular survival models in sample size calculation is the exponential distribution,

Sk(t)= P (Tki≥t)=exp(−λkt) $S_{k} (t) = P (T_{k i} \geq t) = e x p (- λ_{k} t)$

for group k. Then, the probability that a subject in arm k experiences an event is given as

dk= P (Tki≤Ci)=∫∞0Sk(t)dG(t)=1−exp(−λkB){1−exp(−λkA)}Aλk, $d_{k} = P (T_{k i} \leq C_{i}) = \int_{0}^{\infty} S_{k} (t) d G (t) = 1 - \frac{e x p (- λ_{k} B) {1 - e x p (- λ_{k} A)}}{A λ_{k}},$

so that we have Dk=nkdk=nrkdk $D_{k} = n_{k} d_{k} = n r_{k} d_{k}$ . By plugging this in (4) and solving with respect to n, we obtain

n=(1r1d1+1r2d2)(z1−α/2+z1−βlogΔ)2. $n = (\frac{1}{r_{1} d_{1}} + \frac{1}{r_{2} d_{2}}) (\frac{z_{1 - α / 2} + z_{1 - β}}{\log Δ})^{2} .$

We assumed no loss to follow up in calculating d_k, but this assumption can be easily loosened by assuming a distribution for the time to loss to follow up.

In summary, a sample size is calculated as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) λ1,λ2= $λ_{1}, λ_{2} =$ hazard rates under H_a (Δ = λ₂/λ₁)

(b) r_k = the prevalence of group k (r₁ + r₂ = 1)

• Calculate the probability of an event for a subject in group k(=1,2)

dk=1−exp(−λkB){1−exp(−λkA)}Aλk, $d_{k} = 1 - \frac{\exp (- λ_{k} B) {1 - \exp (- λ_{k} A)}}{A λ_{k}},$

• Calculate the sample size

n=(1r1d1+1r2d2)(z1−α/2+z1−βlogΔ)2. $n = (\frac{1}{r_{1} d_{1}} + \frac{1}{r_{2} d_{2}}) (\frac{z_{1 - α / 2} + z_{1 - β}}{\log Δ})^{2} .$

Example 18.5

Suppose that the control group (k = 1) has a median survival time of μ1=3 $μ_{1} = 3$ years, and the experimental group will be considered acceptable if it extends the median survival by 50% (that is, μ2=4.5 $μ_{2} = 4.5$ years). Under the exponential survival model, the annual hazard rates are λ1=0.231 $λ_{1} = 0.231$ and λ2=0.154 $λ_{2} = 0.154$ (Δ = 0.667). Also, suppose that (r₁,r₂) = (0.3,0.7), and patients were uniformly accrued for A = 3 years and the final analysis is conducted B = 2 years after the completion of accrual. Then, we have (d₁,d₂) = (0.5455,0.4115), n = 613 and (n₁ = 184,n₂ = 429), and (D₁,D₂) = (100,177). PROC POWER in SAS/STAT(Program 18.5.2) also provides sample sizes for the log-rank test based on Lakatos (1988), which, unlike our formula, calculates the limit of the variance estimator of the log-rank test under the null hypothesis. As demonstrated here, it gives a slightly smaller sample size, n = 593, under the design setting.

Program 18.5.1 SAS Code for Two-Sample Log-Rank Tests for Survival Data

%macro SS_TwoSmpleLogRank(
          Accrual =,   /* accrual period                          */
          Follow  =,   /* additional follow-up period             */
          inR     =,   /* group allocation proportion(a1, a2)     */
          inLambda=,   /* hazard rates under the alternative      */
          alpha   =,   /* alpha                                   */
          power   =,   /* power                                   */
          sides   =    /* 1: One-sided test 2: Two-sided test     */
                     );
proc iml;
  %let K = 2;  r = &inR; Group = J(&K,1,0); lambda = &inLambda;
  delta = lambda[2]/lambda[1]; d_prob = J(&K, 1,0);
  n = J(&K, 1,0); D = J(&K, 1,0);
  do i=1 to &K;
    group[i] = i;
    d_prob[i]=1-(exp(-lambda[i]*&Follow))*(1-exp(-lambda[i]*&Accrual))
              /(&Accrual*lambda[i]);
  end;
  z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
  z_p2=&power; z_beta = probit(z_p2);
  Total =
int((1/(r[1]*d_prob[1])+1/(r[2]*d_prob[2]))*((z_alpha+z_beta)/log(delta))**
2)+1;
  do i=1 to &K; n[i]= r[i]*Total; D[i]= d_prob[i]*n[i]; end;
  print ' Sample Size Calculation';
  print 'for Two-Sample Log-Rank Test for Survival Data';
  Accrual=&Accrual; Follow_Up= &Follow;
  print 'Accrual Period : ' &Accrual;
  print ' Follow Up period : ' &Follow;
  print Group lambda delta r d_prob;
  print Group n D;
  print Total;
  quit;
run;
%mend SS_TwoSmpleLogRank;

/*-------------- Run the macro for Example 18.5.1 ----------------------------*/

%SS_TwoSmpleLogRank(
  Accrual =3,                  /* accrual period                         */
  Follow  =2,                  /* additional follow-up period            */
  inR     =%str({0.3, 0.7}),   /* group allocation proportion(r1, r2)    */
  inLambda=%str({0.231, 0.154}),/* hazard rates under the alternative    */
  alpha   =0.05,               /* alpha                                  */
  power   =0.9,                /* power                                  */
  sides   =2                   /* two-sided test                         */
      );

Output from Program 18.5.1

Sample Size Calculation
for Two-Sample Log-Rank Test for Survival Data

Accrual Period :	3
Follow Up period :	2

GROUP	LAMBDA	DELTA	R	D_PROB
1	0.231	0.6666667	0.3	0.5455053
2	0.154		0.7	0.411467

GROUP	N	D
1	183.9	100.31843
2	429.1	176.56049

TOTAL
613

Program 18.5.2 SAS Code (PROC POWER) for Example 18.5

      proc power;
          twosamplesurvival test=logrank
          nfractional
          groupmedsurvtimes = 3 | 4.5
          accrualtime = 3
          followuptime = 2
          groupweights = (0.3,0.7)
          alpha = 0.05
          sides=2
          power = 0.9
          ntotal = .
         ;
      run;

Output from Program 18.5.2

The POWER Procedure
Log-Rank Test for Two Survival Curves

Fixed Scenario Elements

Method	Lakatos normal approximation
Form of Survival Curve 1	Exponential
Form of Survival Curve 2	Exponential
Number of Sides	2
Accrual Time	3
Follow-up Time	2
Alpha	0.05
Group 1 Median Survival Time	3
Group 2 Median Survival Time	4.5
Group 1 Weight	0.3
Group 2 Weight	0.7
Nominal Power	0.9
Number of Time Sub-Intervals	12
Group 1 Loss Exponential Hazard	0
Group 2 Loss Exponential Hazard	0

Computed Ceiling N Total

Fractional N Total	Actual Power	Ceiling N Total
592.792138	0.900	593

18.5 Two-Sample Longitudinal Data

In a longitudinal study, we measure the outcome of research interest repeatedly from each subject over a time period. When there are two groups of subjects, one popular primary objective is the between-group comparison of change rate in the expected value of a response variable over time. Typically, the repeated measurements within each subject are correlated. Because of the robustness to possible misspecification of the correlation structure among the repeated measurements, the generalized estimating equation (GEE) method has been one of the most popular methods to fit the regression models and to test on the change rates (Liang and Zeger, 1986). In this section, we discuss sample size estimation methods for such testing. We consider cases where the response variable is continuous or dichotomous.

18.5.1 Generalized Estimating Equations

Suppose that there are n_k subjects in treatment group k(=1,2), n₁,n₂ = n. Let, for group k, Nk=∑nki=1mki $N_{k} = \sum_{i = 1}^{n_{k}} m_{k i}$ denote the total number of observations and r_k = n_k/n the allocation proportion (r₁ + r₂ = 1). For subject i(i = 1,...,n_k) in group k, let y_kij denote the outcome variable at measurement time t_kij (j = 1,..., m_ki) with μkij= E (ykij) $μ_{k i j} = E (y_{k i j})$ that is expressed as

g(μkij)=ak+bktkij, $g (μ_{k i j}) = a_{k} + b_{k} t_{k i j},$

where g(.) is a known link function. In other words, we assume that there exists a link function g(.) linearizing the trajectory in response over time. To simplify the discussions, we use the identity link g(μ)=μ $g (μ) = μ$ for continuous outcome variables and the logit link g(μ)=log{μ/(1-μ)} $g (μ) = log{μ /(1- μ)}$ for binary outcome variables, but the generalization to the use of other links is simple.

The coefficient b_k represents the change rate per unit time in mean response if y is continuous and the change rate in log-odds if y is binary. The measurement times may vary subject by subject due to missing measurements, patients’ visits for measurements at unscheduled times, loss to follow up, or other causes. In this chapter, we assume that any missing data is missing completely at random (Rubin, 1976).

The repeated measurements (ykij,1≤j≤mki) $(y_{k i j}, 1 \leq j \leq m_{k i})$ within each subject tend to be correlated. However, the true correlation structure is usually unknown or of secondary interest. Liang and Zeger (1986) proposed a consistent estimator, called the GEE estimator, based on a working correlation structure. Using either the identity or logit link, the GEE estimator (aˆk,bˆk) $({\hat{a}}_{k}, {\hat{b}}_{k})$ based on the working independence structure solves U_k(a,b) = 0, where

Uk(a,b)=1n√k∑nki=1∑mkij=1{ykij−μkij(a,b)}(1tkij) $U_{k} (a, b) = \frac{1}{{\sqrt{n}}_{k}} \sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} {y_{k i j} - μ_{k i j} (a, b)} (\begin{matrix} 1 \\ t_{k i j} \end{matrix})$

and μkij(a,b)=g−1(a+btkij) $μ_{k i j} (a, b) = g^{- 1} (a + b t_{k i j})$ .

18.5.1.1 Continuous Outcome Variable Case

In the continuous outcome variable case, we have a closed form solution to U_k (a,b) = 0

bˆk=∑nki=1∑mkij=1(tkij−t¯k)ykij∑nki=1∑mkij=1(tkij−t¯k)2 ${\hat{b}}_{k} = \frac{\sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} (t_{k i j} - {\bar{t}}_{k}) y_{k i j}}{\sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} {(t_{k i j} - {\bar{t}}_{k})}^{2}}$

and aˆk=y¯k−bˆkt¯k ${\hat{a}}_{k} = {\bar{y}}_{k} - {\hat{b}}_{k} {\bar{t}}_{k}$ , where t¯k=N−1k∑nki=1∑mkij=1tkij ${\bar{t}}_{k} = N_{k}^{- 1} \sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} t_{k i j}$ and y¯k=N−1k∑nki=1∑mkij=1ykij ${\bar{y}}_{k} = N_{k}^{- 1} \sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} y_{k i j}$ .

By Liang and Zeger (1986), as n→∞ $n \to \infty$ , n−−√k(bˆk−bk)→N(0,vk) ${\sqrt{n}}_{k} ({\hat{b}}_{k} - b_{k}) \to N (0, v_{k})$ in distribution. Here v_k is consistently estimated by

vˆk=∑nki=1{∑mkij=1(tkij−t¯k)ɛˆkij}2{∑nki=1∑mkij=1(tkij−t¯k)2}2 ${\hat{v}}_{k} = \frac{\sum_{i = 1}^{n_{k}} {\sum_{j = 1}^{m_{k i}} (t_{k i j} - {\bar{t}}_{k}) {\hat{ɛ}}_{k i j}}^{2}}{{\sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} {(t_{k i j} - {\bar{t}}_{k})}^{2}}^{2}}$

where ɛˆkij=ykij−aˆk−bˆktkij ${\hat{ɛ}}_{k i j} = y_{k i j} - {\hat{a}}_{k} - {\hat{b}}_{k} t_{k i j}$ .

18.5.1.2 Binary Outcome Variable Case

Let pkij=μkij $p_{k i j} = μ_{k i j}$ in the binary outcome variable case. We have to solve the equation using a numerical method, such as the Newton-Raphson algorithm: at the l-th iteration,

⎛⎝aˆ(l)kbˆ(l)k⎞⎠=⎛⎝aˆ(l−1)kbˆ(l−1)k⎞⎠+n−1/2kA−1k(aˆ(l−1)k,bˆ(l−1)k)Uk(aˆ(l−1)k,bˆ(l−1)k), $(\begin{matrix} {\hat{a}}_{k}^{(l)} \\ {\hat{b}}_{k}^{(l)} \end{matrix}) = (\begin{matrix} {\hat{a}}_{k}^{(l - 1)} \\ {\hat{b}}_{k}^{(l - 1)} \end{matrix}) + n_{k}^{- 1 / 2} A_{k}^{- 1} ({\hat{a}}_{k}^{(l - 1)}, {\hat{b}}_{k}^{(l - 1)}) U_{k} ({\hat{a}}_{k}^{(l - 1)}, {\hat{b}}_{k}^{(l - 1)}),$

where

$A_{k} (a, b) = - n_{k}^{- 1 / 2} \frac{\partial U_{k} (a, b)}{\partial (a, b)} = \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \sum_{j = 1}^{m_{k i}} p_{k i j} (a, b) q_{k i j} (a, b) (\begin{matrix} 1 & t_{k i j} \\ t_{k i j} & t_{k i j}^{2} \end{matrix})$

$p_{k i j} (a, b) = g^{- 1} (a + b t_{k i j}) = \frac{e^{a + b t_{k i j}}}{1 + e^{a + b t_{k i j}}}$ (5)

and $q_{k i j} (a, b) = 1 - p_{k i j} (a, b)$ .

By Liang and Zeger (1986), for large n, ${\sqrt{n}}_{k} {({\hat{a}}_{k} - a_{k}, {\hat{b}}_{k} - b_{k})}^{T}$ is asymptotically normal with mean 0 and variance V_k that can be consistently estimated by

${\hat{V}}_{k} = A_{k}^{- 1} ({\hat{a}}_{k}, {\hat{b}}_{k}) {\hat{Σ}}_{k} A_{k}^{- 1} ({\hat{a}}_{k}, {\hat{b}}_{k})$

where

${\hat{Σ}}_{k} = \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} {\sum_{j = 1}^{m_{k i}} {\hat{ɛ}}_{k i j} (\begin{matrix} 1 \\ t_{k i j} \end{matrix})}^{\otimes 2},$

${\hat{ɛ}}_{k i j} = y_{k i j} - p_{k i j} ({\hat{a}}_{k}, {\hat{b}}_{k})$ and $c^{\otimes 2} = c c^{T}$ for a vector . Let ${\hat{v}}_{k}$ be the (2,2)-component of ${\hat{V}}_{k}$ .

18.5.2 Sample Size Calculation

Suppose that we want to test the rate of change between two groups (that is, H₀: b₁ = b₂) . Based on the asymptotic results from the previous section, we can reject H₀: b₁ = b₂ in favor of H₀: b₁ ≠ b₂ when

$| \frac{{\hat{b}}_{1} - {\hat{b}}_{2}}{\sqrt{{\hat{v}}_{1} / n_{1} + {\hat{v}}_{2} / n_{2}}} | > z_{1 - α / 2},$ (6)

Where Z_1−α/2 is the 100(1−α/2) percentile of a standard normal distribution.

In this section, we derive a sample size formula for the two-sided α test (6) to detect $H_{1} : | b_{2} - b_{1} | = Δ (> 0)$ with power 1−β. When designing a study, we usually schedule fixed visit times $t_{1} < \dots < t_{m}$ for m repeated measurements from each subject. We often set t₁ = 0 for the baseline measurement time. When the study is conducted, however, the subjects may skip some visit times due to various reasons, which results in missing values, or they may not follow the visit schedule correctly so that the observed visit times may be variable. Jung and Ahn (2003) show through simulations in a continuous outcome variable case that the sample size formula based on fixed measurement times is very accurate even when the observed measurement times are widely distributed around the scheduled times.

By simple algebra, we can derive a sample size formula to detect the specified difference $| b_{2} - b_{1} | = Δ$ with power 1−β,

$n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2} (v_{1} / r_{1} + v_{2} / r_{2})}{Δ^{2}},$ (7)

where $v_{k} = {l i m}_{n \to \infty} {\hat{v}}_{k}$ . The expression of v_k is slightly different between continuous and binary outcome variable cases, as shown in the following subsections.

In order to allow for missing values, let δ_j denote the proportion of patients with observations at t_j and δ_j the proportion of patients with observations at both t_j and t_j’. Also, let . $ρ_{j j^{'}} = c o r r (y_{k i j}, y_{k i j^{'}})$ We assume that the missing pattern and correlation structure are common in two treatment groups. We discuss specific models for missing pattern and correlation structure in Section 18.5.3.

18.5.2.1 Continuous Variable Case

We assume that the continuous outcome variable has a constant variance $v a r (y_{k i j}) = σ^{2}$ over time, which is common between two treatment groups. Jung and Ahn (2003) show that is v₁ = v₂ = v expressed as

$v = \frac{σ^{2} (s^{2} + c)}{s^{4}},$

where

$s^{2} = \sum_{j = 1}^{m} δ_{j} {(t_{j} - τ)}^{2}$

$c = \sum \sum_{j \neq j^{'}} δ_{j j^{'}} ρ_{j j^{'}} (t_{j} - τ) (t_{j^{'}} - τ)$

and

$τ = \frac{\sum_{j = 1}^{m} δ_{j} t_{j}}{\sum_{j = 1}^{m} δ_{j}} .$

Hence, (7) is simplified to

$n = \frac{v {(z_{1 - α / 2} + z_{1 - β})}^{2}}{Δ^{2} r_{1} r_{2}} .$ (8)

Note that we do not have to specify the true values for ${(a_{k}, b_{k}), k = 1, 2}$ but only the difference $| b_{2} - b_{1} | = Δ$ in sample size calculation. The sample size is proportional to the variance of measurement error and decreases as r₁ approaches 1/2.

18.5.2.2 Binary Variable Case

Let P_kj = a_k + b_kt_k denote the success probabilities under H₁. Jung and Ahn (2005) show that

$v_{k} = \frac{s_{k}^{2} + c_{k}}{s_{k}^{4}},$

where

$s_{k}^{2} = \sum_{j = 1}^{m} δ_{j} p_{k j} q_{k j} {(t_{j} - τ_{k})}^{2}$

$c_{k} = \sum \sum_{j \neq j^{'}} δ_{j j^{'}} ρ_{j j^{'}} \sqrt{p_{k j} q_{k j} p_{k j^{'}} q_{k j^{'}}} (t_{j} - τ_{k}) (t_{j^{'}} - τ_{k}),$

and

$τ_{k} = \frac{\sum_{j = 1}^{m} δ_{j} p_{k j} q_{k j} t_{j}}{\sum_{j = 1}^{m} δ_{j} p_{k j} q_{k j}} .$

As shown here, the sample size formula under the binary outcome variable case depends on the probabilities (P_kj, j = 1, . . . , m) under H₁, so that we need to specify all regression parameters ${(a_{k}, b_{k}), k = 1, 2}$ . Let Δ = b₁−b₂ denote the difference in slope between two groups under H₁. If pilot data exist for the control group (group 1), then we may use the estimates as the parameter values, a₁ and b₁. If t = 0 is the baseline, the intercepts in the two groups are set the same (that is, a₁ = a₂) . By setting b₂ = b₁ + Δ, we can specify all regression coefficients under H₁.

If no pilot data are available, we may specify the binary probabilities at the baseline, p₁₁, and at the end of follow up, p_1m , for the control group. Then we obtain (a₁b₁) by

$b_{1} = \frac{g (p_{1 m}) - g (p_{11})}{t_{m} - t_{1}},$ (9)

$a_{1} = g (p_{11}) - b_{1} t_{1} = g (p_{11}).$

And we set a₂ = a₁(since p₁₁ = p₂₁ at the baseline) and b₂ = b₁ + Δ.

18.5.3 Modelling Missing Pattern and Correlation Structure

Calculation of (or) requires projection of the missing probabilities and the true correlation structure.

18.5.3.1 Missing Pattern

In order to specify δ_jj, we need to estimate the missing pattern. If missing at time t_j is independent of missing at time t_j for each patient, then we have δ_jj’ = δ_j_j, and we call this type independent missing.

In some studies, subjects missing at a measurement time may be missing at all following measurement times, as in the labor pain study discussed in Example 18.6. This type of missing is called monotone missing. In this case, we have δ_jj’ = δ_j for ( $j < j^{'} (δ_{1} \geq \dots \geq δ_{m})$ ). In monotone missing cases, one may want to specify the proportion of patients who will contribute exactly the first j observations, say £j. Then, noting that $η_{j} = Δ_{j} - Δ_{j + 1}$ for j = 1,...,m−1 and μ_m = δ_m , we can obtain δ_j recursively starting from δ_m. Note also that $\sum_{j} η_{j} = Δ_{1}$ , which equals 1 if all patients have measurements at the first measurement time t₁.

In summary, we consider two missing patterns as candidates to the true one: (a) independent missing, where $δ_{j j^{'}} = δ_{j} δ_{j^{'}}$ , and (b) monotone missing, where $δ_{j j^{'}} = δ_{j^{'}}$ for $j < j^{'}$ ( $δ_{1} \geq \dots \geq δ_{m}$ ).

18.5.3.2 Correlation Structure

Now, we specify the true correlation structure $ρ_{j j^{'}}$ . A reasonable model for $ɛ_{i j} = y_{i j} - g^{- 1} (a + b t_{i j})$ may be

$ɛ_{i j} = u_{i} + e_{i j},$

where u_i is a subject-specific error term with variance $σ_{u}^{2}$ and e_ij is a serially correlated within-subject error term with variance $σ_{e}^{2}$ and correlation coefficients ${\tilde{ρ}}_{j j^{'}}$ . Assuming that u_i and e_ij are independent, we have

$c o v (ɛ_{i j}, ɛ_{i j^{'}}) = σ_{a}^{2} + σ_{e}^{2} {\tilde{ρ}}_{j j^{'}} .$

Often, the variation between subjects ( $σ_{u}^{2}$ ) is much larger than that within subjects ( $σ_{e}^{2}$ ). In this case, we have

$c o v (ɛ_{i j}, ɛ_{i j^{'}}) \approx σ_{u}^{2}$

and an exchangeable correlation structure, $ρ_{j j^{'}} = ρ$ for $j \neq j^{'}$ , may be a reasonable approximation to the true one.

On the other hand, if the variation within the subject ( $σ_{e}^{2}$ ) dominates over the variation between subjects ( $σ_{u}^{2}$ ), then we will have

$c o v (ɛ_{i j}, ɛ_{i j^{'}}) \approx σ_{e}^{2} {\tilde{ρ}}_{j j^{'}},$

so that a serial correlation structure may be a reasonable approximation to the true one. One of the most popular serial correlation structures, especially when measurement times are not equidistant, is a continuous autocorrelation model with order 1, AR(1), for which $ρ_{j j^{'}} = ρ^{| t_{j} - t_{j^{'}} |}$ .

We consider two correlation structures as candidate approximations to the true one: (i) exchangeable, where $ρ_{j j^{'}} = ρ$ , and (ii) AR(1), where $ρ_{j j^{'}} = ρ^{| t_{j} - t_{j^{'}} |}$ .

18.5.3.3 Examples

For sample size calculation, the following parameters need to be specified commonly in both continuous and discrete outcome variable cases:

• Type I and II errors α and β, respectively.

• The size of difference to detect, $| b_{2} - b_{1} | = Δ$ .

• Allocation proportion for group k, r_k.

• Correlation structure and the associated correlation parameter ρ, (that is, (i) $ρ_{j j^{'}} = ρ$ for exchangeable or (ii) $ρ_{j j^{'}} = ρ^{| t_{j} - t_{j^{'}} |}$ for AR(1)).

• Proportion of patients with an observation at t_j, δ_j.

• Missing pattern: (a) independent missing ( $δ_{j j^{'}} = δ_{j} δ_{j^{'}}$ ) or (b) monotone missing ( $δ_{j j^{'}} = δ_{j^{'}}$ for $j < j^{'}$ ) .

In addition, we need to specify $σ^{2} = v a r (ɛ_{i j})$ in the continuous outcome variable case and the regression coefficients ${(a_{k}, b_{k}), k = 1, 2}$ under H₁ in the binary outcome variable case.

We demonstrate our sample size formula with real longitudinal studies.

Example 18.6 Continuous Outcome Case

In a study on labor pain (Davis, 1991), 83 women in labor were assigned to either a pain medication group (43 women) or a placebo group (40 women). At 30-minute intervals, the self-reported amount of pain was marked on a 100mm line, where 0 = no pain and 100 = extreme pain. The maximum number of measurements for each woman was , but there were numerous missing values at later measurement times with monotone missing pattern. A simple approach to such a study objective might be to estimate and compare the slopes of pain scores over time in the two treatment groups. In this study, the outcome variable is continuous. Suppose that we want to design a new study on labor pain based on the data reported by Davis (1991). As in the original study, we assume monotone missing. In this study, the measurement times were equispaced, so that we set t_j = j − 1 (j = 1,…6) for convenience. From the data, we obtained σ² = 815.84. Suppose we want to detect a difference of σ in mean pain score between two groups at t₆. So, we project Δ = σ/(t₆ − t₁ = 5.71 in a new study. We consider a balanced design (that is, r₁ = r₂ = 1/2) . Also, from the data, the proportion of observed measurements are

$(δ_{1}, δ_{2}, δ_{3}, δ_{4}, δ_{5}, δ_{6}) = (1, .90, .78, .67, .54, .41).$

From these results τ = 2.02, we obtain and s² = 11.42.

Suppose that we want to detect a difference of Δ = 5.71 with 80% of power (z_1−β = 0.84) using a two-sided α = .05 (z_1−α/2 = 1.96) test. Under an exchangeable correlation structure, we obtain ρ = .64 and c = −3.13 from the data, so that we have v = 815.84×(11.42−3.13)/11.42² = 51.84. Hence, from (8), the required sample size is calculated as

$n = [\frac{51.84 \times {(1.96 + 0.84)}^{2}}{{5.71}^{2}}] + 1 = 50.$

where [x] is the largest integer not exceeding x.

Under AR(1), we obtain from the data ρ = .80 and c = 2.31, so that the required sample size for detecting Δ = 5.71 with 80% of power using a two-sided α = .05 test is given as n = 83. The sample size under AR(1) is larger than that under an exchangeable correlation, by about 66%, in this example.

Program 18.6 SAS Code for Two-Group Comparision of Repeated Continuous Measurement

%macro SS_RepeatedContinuousMeasurement(
               missingPattern = , /* 1: independent missing 2: monotone missing     */
               corrStructure = , /* 1: compound symetric , 2: AR(1)                */
               rho            = , /* associated correlation parameter               */
               m              = , /* number of measurement time points              */
               sigma_sq       = , /* variance                                       */
               inR            = , /* group allocation proportion(r1, r2)            */
               inDelta        = , /* proportion of observed measurements            */
               alpha          = , /* alpha                                          */
               power          = , /* power                                          */
               sides         = , /* 1: one-sided test 2: two-sided test           */
               print   = 0       /* 0:default 1: detail                            */
                               );
proc iml;
%let K = 2; r = &inR; delta = &inDelta; t = J(&m,1,0);
do j=1 to &m; t[j] = j-1; end;
%if &inDelta eq %then %do; delta=J(&m,1,0);do j=1 to &m; delta[j]=1-(j-1)/20; end; %end;
d = sqrt(&sigma_sq)/(t[&m]-t[1]);
start g(p); gp = log(p/(1-p)); return (gp); finish g;
start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
start rho(i,j,r,c);
    if c=1 then do; /* CS */ if i=j then rho_ij =1;else rho_ij = r; end;
    else do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
    return (rho_ij);
finish rho;
tau_num =0; tau_denum = 0;
do j=1 to &m; tau_num = tau_num+delta[j]*t[j]; tau_denum = tau_denum+delta[j]; end;
tau= tau_num/tau_denum; s_sq=0;
do j=1 to &m; s_sq = s_sq+delta[j]*((t[j]-tau)**2); end;
c=0;
do i=1 to &m;
     do j=1 to &m;
           if i ^= j then do;
              if &missingPattern = 1 then do; delta_ij = delta[i]*delta[j]; end;
          else do; if j > i then max_ij=j; else max_ij=i; delta_ij=delta[max_ij]; end;
          c = c+delta_ij*rho(i,j,&rho,&corrStructure)*(t[i]-tau)*(t[j]-tau);
                end;
      end;
   end;
   v = &sigma_sq*(s_sq+c)/(s_sq**2);
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1); z_p2=&power; z_beta = probit(z_p2);
   n = int((v*(z_alpha+z_beta)**2)/(d**2*r[1]*r[2]))+1;
   print 'Sample Size Calculation for a Two-Group Comparision';
   print ' of Repeated Continous Measurements';
   alpha =α power=&power; rho=ρ sides=&sides;
   sigma_sq=&sigma_sq;
   print alpha power rho sides;
   if &missingPattern = 1 then do;print ' Missing Pattern: Independent '; end;
   else if &missingPattern = 2 then do; print ' Missing Pattern: Monotone '; end;
   if &corrStructure=1 then do; print ' Correlation Structure : Compound Symetric ';end;
   else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) '; end;
   %if &print = 1 %then %do; print delta; print tau s_sq c v ; %end;
print d sigma_sq;
   print n;
   quit;
run;
%mend SS_RepeatedContinuousMeasurement;

/*--Run the macro for Example 18.6 when Correlation Structure = Compound Symetric --*/

%SS_RepeatedContinuousMeasurement(
       missingPattern = 2,              /* monotone missing                      */
       corrStructure = 1,              /* compound symetric                     */
       rho            = 0.64,           /* associated correlation parameter     */
       m              = 6,              /* number of measurement time points    */
      sigma_sq       = 815.84,         /* variance                              */
      inR          = %str({0.5, 0.5}), /* group allocation proportion(r1, r2)   */
      inDelta      = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
                                        /* proportion of observed measurements   */
       alpha        = 0.05,             /* alpha                                 */
       power        = 0.8,             /* power                                 */
       sides         = 2,               /* two-sided test                        */
       print         = 1
        );

/*--- Run the macro for Example 18.6 when Correlation Structure = AR(1) ---------*/

%SS_RepeatedContinuousMeasurement(
     missingPattern = 2,              /* monotone missing                      */
     corrStructure = 2,              /* AR(1)                                  */
     rho             = 0.8,            /* associated correlation parameter      */
   m              = 6,               /* number of measurement time points     */
     sigma_sq        = 815.84,         /* variance                              */
     inR            = %str({0.5, 0.5}),/* group allocation proportion(r1, r2)   */
     inDelta        = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
                                       /* proportion of observed measurements   */
     alpha          = 0.05,            /* alpha                                  */
     power          = 0.8,             /* power                                  */
     sides           = 2,              /* two-sided test                         */
     print = 1
        );

Output from Program 18.6

/*------------------ When Correlation Structure = Compound Symetric --------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements

ALPHA	POWER	RHO	SIDES
0.05	0.8	0.64	2

Missing Pattern: Monotone
Correlation Structure : Compound Symetric

TAU	S_SQ	C	V
2.0186047	11.418512	-3.133345	51.842656

D	SIGMA_SQ
5.7125826	815.84

N
50

/*--------------------- When Correlation Structure = AR(1) -----------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements

ALPHA	POWER	RHO	SIDES
0.05	0.8	0.8	2

Missing Pattern: Monotone
Correlation Structure : AR(1)

TAU	S_SQ	C	V
2.0186047	11.418512	2.3055968	85.87567

D	SIGMA_SQ
5.7125826	815.84

N
83

Example 18.7 Binary Outcome Case

The Genetics vs. Environment In Scleroderma Outcome Study (GENISOS) is an observational study designed as a collaboration of the University of Texas-Houston Health Science Center with the University of Texas Medical Branch at Galveston and the University of Texas-San Antonio Health Science Center (Reveille et al., 2001). Scleroderma, or systemic sclerosis, is a multisystem disease of unknown etiology characterized by cutaneous and visceral fibrosis, small blood vessel damage, and autoimmune features (Medsger, 1997). The study subjects are regularly followed to check for the occurrence of pulmonary fibrosis. In this case, the outcome is a binary variable.

Suppose that we want to develop a study to examine the effect of a new drug in preventing the occurrence of pulmonary fibrosis in subjects with scleroderma compared to no intervention. The parameter values specified here for sample size calculation are approximated by the estimates from the current data set of GENISOS.

We want to estimate the sample size for the new study using α = .05 and 1−β = .8 . As in GENISOS, presence or absence of pulmonary fibrosis is assessed at baseline and at months 6, 12, 18, 24, and 30. Because the measurement times are equidistant, we set t_j = j − 1 j ( j = 1,…,6 ) for the m = 6 time points. The within-group correlation structure of the repeated measurements conforms to AR(1) with the adjacent correlation equal to ρ = .8 (that is, $ρ_{j j^{'}} {= .8}^{| j - j^{'} |}$ ). We consider assigning an equal number of scleroderma patients in each of two groups (that is, r₁ = r₂ =1/2).

Approximately 75% of scleroderma patients do not have pulmonary fibrosis at the baseline in the ongoing GENISOS. We project that the proportion of subjects without pulmonary fibrosis is 75% at baseline (that is, p_1,1 = .75) and 50% at 30 months (that is, p_1,j = .50) in a placebo group. We assume that a new therapy will prevent or delay further occurrence of pulmonary fibrosis. That is, the proportion of subjects without pulmonary fibrosis will remain 75% during the 30-month study in a new therapy group (in other words, p_2,1 = p_2,6 = .75 ). From these values and (9), we obtain

$b_{1} = \frac{g (.5) - g (.75)}{5 - 0} = - 0.220,$

and a₁ = g(.75) = 1.099. Similarly, we obtain (a₂, b₂)=(1.099, 0), so that Δ = 0 − (−0.220) = 0.220. By (5), the probabilities of no pulmonary fibrosis at the 6 time points are obtained as (.750,.707,.659,.608,.555,.500) for the placebo group and (.750,.750,.750,.750,.750,.750) for the treatment group.

The proportions of observed measurements are expected to be

$(δ_{1}, δ_{2}, δ_{3}, δ_{4}, δ_{5}, δ_{6}) = (1, .95, .90, .85, .80, .75).$

Suppose that we expect independent missing. Then, we obtain $δ_{j j^{'}}$ from the specified δ_j values using $δ_{j j^{'}} = δ_{j} δ_{j^{'}}$ . Now we have all the parameter values required, and we obtain v₁ = 0.305 and v₁ = 0.353. Finally, from (7), we obtain

$n = [\frac{{(1.96 + 0.84)}^{2} (0.305 / 0.5 + 0.353 / 0.5)}{{0.220}^{2}}] + 1 = 215.$

If we assume monotone missing ( $δ_{j j^{'}} = δ_{j \lor j^{'}}$ ), we obtain v₁ = 0.324 , , v₂ = 0.308 and n = 229 .

Program 18.7 SAS Code for Two-Group Comparision of Repeated Binary Measurement

%macro SS_RepeatedBinaryMeasurement(
       missingPattern =   , /* 1: independent missing 2: monotone missing       */
       corrStructure =    , /* 1: compound symetric , 2: AR(1)                  */
       rho           =    , /* associated correlation parameter               */
       m             =    , /* number of measurement time points                */
       inP1          =    , /* proportion of subjects for control group         */
       inP2          =    , /* proportion of subjects                           */
       inR           =    , /* group allocation proportion(r1, r2)              */
       inDelta =          , /* proportion of observed measurements              */
       alpha   =          , /* alpha                                            */
       power   =          , /* power                                            */
       sides   =          /* 1: one-sided test 2: two-sided test             */
     );
proc iml;
   start g(p); gp = log(p/(1-p)); return (gp); finish g;
   start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
   start rho(i,j,r,c);
    if c=1 then do; /* CS */ if i=j then rho_ij =1; else rho_ij = r; end;
    else if c = 2 then do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
    return (rho_ij);
   finish rho;
   %let K = 2;
   P1=&inP1; P2=&inP2; r =&inR; a=J(&K,1,0); b=J(&K,1,0); t=J(&m,1,0);
   tau = J(&K,1,0); s_sq = J(&K,1,0); v = J(&K,1,0); c_sq = J(&K,1,0);
   do j=1 to &m; t[j] = j-1; end;
   %if &inDelta eq %then %do;
       delta = J(&m,1,0);
       do j=1 to &m; delta[j] = 1-(j-1)/20; end;
   %end;
   b[1] = (g(P1[&m])-g(P1[1]))/(t[&m]-t[1]); a[1] = g(P1[1])-B[1]*t[1];
   b[2] = (g(P2[&m])-g(P2[1]))/(t[&m]-t[1]); a[2] = g(P2[1])-B[1]*t[1];
   d = b[2]-b[1];
   do j=1 to &m;
      T[j] = j-1; P1[j]= prob(a[1], b[1], t[j]); P2[j]= prob(a[2], b[2], t[j]);
   end;
   tau1_num =0; tau1_denum = 0; tau2_num =0; tau2_denum = 0;
   do j=1 to &m;
     tau1_num = tau1_num+delta[j]*P1[j]*(1-P1[j])*t[j];
     tau1_denum = tau1_denum+delta[j]*P1[j]*(1-P1[j]);

     tau2_num = tau2_num+delta[j]*P2[j]*(1-P2[j])*t[j];
     tau2_denum = tau2_denum+delta[j]*P2[j]*(1-P2[j]);
   end;
   tau[1]= tau1_num/tau1_denum; tau[2]= tau2_num/tau2_denum;
   s1=0;s2=0;
   do j=1 to &m;
      s1 = s1+delta[j]*P1[j]*(1-P1[j])*((t[j]-tau[1])**2);
      s2 = s2+delta[j]*P2[j]*(1-P2[j])*((t[j]-tau[2])**2);
   end;
   s_sq[1]=s1; s_sq[2]=s2; c1=0; c2=0; c12=0;
   do i=1 to &m;
      do j=1 to &m;
           if i ^= j then do;
              if &missingPattern = 1 then do;
             delta_ij = delta[i]*delta[j];
              end;else do;
                 if j > i then max_ij = j; else max_ij = i; delta_ij = delta[max_ij];
          end;
          c1=c1+delta_ij*rho(i,j,&rho,&corrStructure)*
             sqrt(P1[i]*(1-P1[i])*P1[j]*(1-P1[j]))*((t[i]-tau[1])*(t[j]-tau[1]));
         c2=c2+delta_ij*rho(i,j,&rho,&corrStructure)*
             sqrt(P2[i]*(1-P2[i])*P2[j]*(1-P2[j]))*((t[i]-tau[2])*(t[j]-tau[2]));
         end;
      end;
   end;
   c_sq[1]=c1; c_sq[2]=c2;
   do k =1 to &K; v[k] = (s_sq[k]+c_sq[k])/(s_sq[k]**2); end;
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
   z_p2=&power; z_beta = probit(z_p2);
   n = int((z_alpha+z_beta)**2*(v[1]/r[1]+v[2]/r[2])/d**2)+1;

   print 'Sample Size Calculation for a Two-Group Comparision';
   print ' of Repeated Binary Measurements';
   alpha =α power=&power; rho=ρ sides=&sides;
   print alpha power rho sides;
   if &missingPattern = 1 then do; print ' Missing Pattern: Independent '; end;
   else if &missingPattern = 2 then do;print ' Missing Pattern: Monotone '; end;
   if &corrStructure = 1 then do; print ' Correlation Structure : Compound Symetric ';
   end; else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) ';
   end;
   print P1 P2 delta v ; print d; print n;
   quit;
run;
%mend SS_RepeatedBinaryMeasurement;

/*--------------- Run the macro for Example 18.7 ------------------------------*/

%SS_RepeatedBinaryMeasurement(
     missingPattern = 1,          /* 1: independent missing 2: monotone missing     */
     corrStructure = 2,          /* 1: compound symetric , 2: AR(1)                */
     rho    = 0.8,               /* associated correlation parameter              */
     m       = 6,                 /* number of measurement time points              */
    inP1    = %str({0.75, 0, 0, 0, 0, 0.5}),
                                  /* proportion of subjects for control group       */
    inP2    = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}),
                                  /* proportion of subjects                         */
     inR     = %str({0.5, 0.5}), /* group allocation proportion(r1, r2)            */
     inDelta = ,      /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
     alpha   = 0.05,              /* alpha                                          */
     power   = 0.8,               /* power                                          */
     sides   = 2                  /* 1: one-sided test 2: two-sided test           */
     );

%SS_RepeatedBinaryMeasurement(
    missingPattern = 2,          /* monotone missing                               */
    corrStructure = 2,           /* AR(1)                                           */
     rho    = 0.8,                 /* associated correlation parameter             */
     m       = 6,                   /* number of measurement time points            */
    inP1   = %str({0.75, 0, 0, 0, 0, 0.5}),

                                    /* proportion of subjects for control group     */
    inP2    = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}),
                                 /* proportion of subjects                          */
     inR     = %str({0.5, 0.5}),    /* group allocation proportion(r1, r2)          */
     inDelta = ,     /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
     alpha   = 0.05,                /* alpha                                       */
     power   = 0.8,                 /* power                                       */
     sides   = 2                    /* 1: one-sided test 2: two-sided test         */
     );

Output from Program 18.7

/*--------------When Missing Pattern Is Independent Missing ----------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements

ALPHA	POWER	RHO	SIDES
0.05	0.8	0.8	2

Missing Pattern: Independent
Correlation Structure : AR(1)

P1	P2	DELTA	V
0.75	0.75	1	0.3048798
0.7065921	0.75	0.95	0.3534175
0.6590733	0.75	0.9
0.6081268	0.75	0.85
0.5547107	0.75	0.8
0.5	0.75	0.75

D
0.2197225

N
215

/*-------------- When Missing Pattern Is Monotone Missing ------------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements

ALPHA	POWER	RHO	SIDES
0.05	0.8	0.8	2

Missing Pattern: Monotone
Correlation Structure : AR(1)

P1	P2	DELTA	V
0.75	0.75	1	0.3236844
0.7065921	0.75	0.95	0.3804059
0.6590733	0.75	0.9
0.6081268	0.75	0.85
0.5547107	0.75	0.8
0.5	0.75	0.75

D
0.2197225

N
229

18.6 Discussion

In designing an observational study, we should choose a statistical testing method that will be used when analyzing the final data. For an accurate sample size calculation, the sample size formula should be directly derived from the power function of the chosen testing method.

There are two types of parameters included in sample size formulas: primary and nuisance parameters. The primary parameters are those shown in the statistical hypotheses (for example, means, binomial proportions, and hazard rates), and the nuisance parameters (for example, prevalence rates, accrual period in survival data, correlation coefficients, and missing type and probabilities in longitudinal data) are those of no or secondary interest. Because both types of parameters determine the final sample size, it is important to specify the parameter values as accurately as possible. If a database exists, we usually estimate the values of nuisance parameters and the primary parameter value of the control group from the database. Although r_k may be chosen by the prevalence of a disease, we may decide to use different allocation proportions from the prevalence rates in the natural population depending on the relative cost to accrue subjects in two groups. The primary parameter value for the case group is usually chosen based on the clinical significance of the associated intervention on the outcome variable, which is measured in terms of the difference or ratio of the primary parameter values between the two groups. When there is uncertainty in some of these parameters, we may conduct a sensitivity analysis on the power to demonstrate that the power would be adequate under a range of scenarios.

Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under H_α

Let ${\tilde{X}}_{k i} = X_{k i} - θ_{k}$ . Then ${\tilde{X}}_{11},\dots, {\tilde{X}}_{1 n_{1}}, {\tilde{X}}_{21},\dots, {\tilde{X}}_{2 n_{2}}$ are IID with cumulative density function (CDF) F(x). Noting that

$W = \frac{1}{n_{1} n_{2}} \sum_{i = 1}^{n_{1}} \sum_{j = 1}^{n_{2}} I ({\tilde{X}}_{1 i} > {\tilde{X}}_{2 j} - Δ),$

we have $W = \int_{- \infty}^{\infty} {\hat{F}}_{2} (x + Δ) d {\hat{F}}_{1} (x)$ , where ${\hat{F}}_{k} (x) = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} I ({\tilde{X}}_{k i} \leq x)$ is the empirical CDF of ${\tilde{X}}_{k 1},\dots, {\tilde{X}}_{k n_{k}}$ that uniformly converges to F(x) as $n_{k} \to \infty$ . Let

$ν_{a} = P ({\tilde{X}}_{1 i} > {\tilde{X}}_{2 j} - Δ) = \int_{- \infty}^{\infty} F (x + Δ) d F (x)$ . For a large n, $W - ν_{a}$ is expressed as

$\int_{- \infty}^{\infty} {\hat{F}}_{2} (x + Δ) d {\hat{F}}_{1} (x) - \int_{- \infty}^{\infty} F (x + Δ) d F (x)$

$= \int_{- \infty}^{\infty} F (x + Δ) d {{\hat{F}}_{1} (x) - F (x)} + \int_{- \infty}^{\infty} {{\hat{F}}_{2} (x + Δ) - F (x + Δ)} d F (x) + o_{p} (n^{- 1}),$

where $o_{p} (n^{- 1}) = \int_{- \infty}^{\infty} {{\hat{F}}_{2} (x + Δ) - F (x + δ)} d {{\hat{F}}_{1} (x) - F (x)}$ is negligible for a large n.

Because

$\int_{- \infty}^{\infty} F (x + Δ) {\hat{F}}_{1} (x) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} F ({\tilde{X}}_{1 i} + Δ)$

and

$\int_{- \infty}^{\infty} F (x + Δ) {\hat{F}}_{1} (x) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} F ({\tilde{X}}_{1 i} + Δ)$

we have $v a r (W) = σ_{1}^{2} / n_{1} + σ_{2}^{2} / n_{2}$ , where

$σ_{1}^{2} = v a r {F ({\tilde{X}}_{1 i} + Δ)}$

$σ_{2}^{2} = v a r {\bar{F} ({\tilde{X}}_{2 i} - Δ)} = v a r {F ({\tilde{X}}_{2 i} - Δ)}.$

References

Aalen, O. 1978. “Nonparametric inference for a family of counting processes.” Annals of Statistics 6: 701–726.

Carroll, R. J., and D. Ruppert. 1988. Transformation and Weighting in Regression. New York: Chapman and Hall/CRC.

Davis, C. S. 1991. “Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials.” Statistics in Medicine 10: 1959–1980.

Fleming, T. R., and D. P. Harrington. 1991. Counting Processes and Survival Analysis. New York: John Wiley & Sons, Inc.

Gart, J. J. 1985. “Approximate tests and interval estimation of the common relative risk in the combination of 2 × 2 tables.” Biometrika >72: 673–677.

Jung, S. H., and C. Ahn. 2003. “Sample size estimation for GEE method for comparing slopes in repeated measurements data.” Statistics in Medicine >22: 1305–1315.

Jung, S. H., and C. W. Ahn. 2005. “Sample size for a two-group comparison of repeated binary measurements using GEE.” Statistics in Medicine 24(17): 2583–2596.

Jung, S. H., S. C. Chow, and E. Chi. 2007. “A note on sample size calculation based on propensity analysis in non-randomized trials.” Journal of Biopharmaceutical Statistics >17: 35–41.

Lakatos, E. 1988. “Sample sizes based on the log-rank statistic in complex clinical trials.” Biometrics 44: 229–241.

Liang, K. Y., and S. L. Zeger. 1986. “Longitudinal data analysis using generalized linear models.” Biometrika >73: 13–22.

Mann, H. B., and D. R. Whitney. 1947. “On a test of whether one of two random variables is stochastically larger than the other.” The Annals of Mathematical Statistics 18: 50-60.

Medsger, Jr., T. A. 1997. “Systemic sclerosis (scleroderma): clinical aspects. In Arthritis and allied conditions: a textbook of rheumatology. 13th ed. Edited by W. J. Koopman. Baltimore, MD: Williams & Wilkins, pp 1433–1464.

Nelson, W. 1969. “Hazard plotting for incomplete failure data.” Journal of Quality Technology 1(1): 27–52.

Peto, R., and J. Peto. 1972. “Asymptotically efficient rank invariant test procedures (with discussion).” Journal of the Royal Statistical Society, Series A: General 135: 185–206.

Reveille, J. D., M. Fischbach, T. McNearney, et al. 2001. “Systemic sclerosis in 3 US ethnic groups: a comparison of clinical, sociodemographic, serologic, and immunogenetic determinants.” Seminars in Arthritis and Rheumatism >30(5): 332–346.

Rosenbaum, P. R., and D. B. Rubin. 1983. “The central role of the propensity score in observational studies for causal effects.” Biometrika >70: 41–55.

Rosenbaum, P. R., and D. B. Rubin. 1984. “Reducing bias in observational studies using subclassification on the propensity score.” Journal of the American Statistical Association 79: 516–524.

Rubin, D. B. 1976. “Inference and missing data.” Biometrika >63: 581–590.

Rubinstein, L. V., M. H. Gail, and T. J. Santner. 1981. “Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation.” Journal of Chronic Diseases >34(9/10): 469–479.

SAS Institute Inc. 2009. SAS/STAT9.2 User’s Guide, Second Edition. Cary, NC: SAS Institute Inc.

Yue, L. Q. 2007. “Statistical and regulatory issues with the application of propensity score analysis to non-randomized medical device clinical studies.” Journal of Biopharmaceutical Statistics 17: 1–13.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 18 Sample Size Calculation for Observational Studies

Create new playlist

Sign In

Sign Up

18.2.1 Two-Sample -test

Example 18.1

Output from Program 18.1

18.2.2 Wilcoxon Rank Sum Test

Example 18.2

Output from Program 18.2

18.3.1 Two-Sample Test on a Binary Outcome

Example 18.3

Output from Program 18.3

18.3.2 Weighted Mantel-Haenszel Test

Example 18.4

Output from Program 18.4

Example 18.5

Output from Program 18.5.1

Output from Program 18.5.2

18.5.1 Generalized Estimating Equations

18.5.1.1 Continuous Outcome Variable Case

18.5.1.2 Binary Outcome Variable Case

18.5.2 Sample Size Calculation

18.5.2.1 Continuous Variable Case

18.5.2.2 Binary Variable Case

18.5.3 Modelling Missing Pattern and Correlation Structure

18.5.3.1 Missing Pattern

18.5.3.2 Correlation Structure

18.5.3.3 Examples

Example 18.6 Continuous Outcome Case

Output from Program 18.6

Example 18.7 Binary Outcome Case

Output from Program 18.7

Table of Contents for
Chapter 18 Sample Size Calculation for Observational Studies