Chapter 18

Sample Size Calculation for Observational Studies

Sin-Ho Jung
Taiyeong Lee
Elizabeth DeLong

Abstract

18.1 Introduction

18.2 Continuous Variables

18.3 Binary Variables

18.4 Two-Sample Log-Rank Test for Survival Data

18.5 Two-Sample Longitudinal Data

18.6 Discussion

Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under Ha

References

 

Abstract

For a number of reasons, observational studies are currently being used to provide evidence to support medical decisions. Because these studies do not carry the same level of credibility conferred to randomized trials, appropriate attention to statistical issues is paramount. In particular, considerations of event rate, sample size, and power frequently occur. This chapter provides mechanisms for dealing with these considerations under a variety of scenarios. Additionally, the methods presented here are applicable more generally to clinical trials as well.

 

18.1 Introduction

The increasing availability of computerized clinical patient data has stimulated greater interest in research on observational data. Individual medical practices and hospitals are now attempting to draw inferences on the basis of data gathered on their own patients. Additionally, the proliferation of large registry databases that harvest clinical data from hundreds of sites has created opportunities to study questions regarding rare diseases and rare events. However, even studies using these large databases, when reduced to the appropriate study populations and endpoints, encounter issues of sample size and power.

Observational studies can be designed to address a variety of objectives. If the primary analysis of the study is simply to estimate some parameter, then one can use standard sample size calculations for requiring a confidence interval of at most a certain width. If the objective involves the comparison of cohorts, then in many cases the statistical analysis will involve some form of selection bias adjustment due to the non-randomized nature of observational research. Chapters 1 through 11 cover the issues of selection bias adjustment and standard analytical methods for comparing cohorts in these settings, including propensity score regression, stratification, matching, doubly robust adjustment, instrumental variables, local control, and various longitudinal analyses. Regarding the computation of sample sizes in study planning in these scenarios, such complex designs and analyses may require custom sample size and power estimate calculations through simulations. In some cases, the primary hypothesis may be simplified for the purpose of estimating power and sample size, and generalized SAS macros can perform the needed calculations.

In this chapter, we present a series of SAS macros for computing sample sizes for studies with various types of outcomes and comparisons (all for comparisons between two groups):

t-test (Section 18.2.1)

• Wilcoxon Rank Sum test (Section 18.2.2)

• Two-Sample Tests on Binary outcomes (Section 18.3.1)

• Weighted Mantel-Haenszel test (Section 18.3.2)

• Log-Rank test for survival data (Section 18.4)

• Longitudinal Data—Continuous outcomes (Section 18.5.3.3)

• Longitudinal Data—Binary outcomes (Section 18.5.3.3)

The sample size macro for the Weighted Mantel-Haenszel test (provided in Section 18.3.1) is directly applicable for observational studies for which the analysis is based on the commonly utilized propensity score stratification approach. This propensity score stratification analysis methodology is described in detail in Chapter 2. In brief, propensity score stratification involves estimating a propensity score for each study participant, stratifying by this score, estimating the difference in outcomes between the two cohorts within each of the strata, and then combining the estimated cohort differences over strata.

Other sections in this chapter present sample size macros based on analysis methods without specific selection bias adjustments. However, these can be very useful for observational study planning when simple cohort comparisons are planned (for example, such as when causal inference is not the objective) or as a very easy-to-use initial calculation when other methods are being utilized such as propensity score matching. In this example, one would supplement the sample sizes computed in these macros with an estimate of the number of patients excluded from the analysis due to a lack of propensity score matched patients in the other cohort. In addition, most of the sample size calculation methods in this chapter can be used for randomized clinical trials as well.

 

18.2 Continuous Variables

In this section, we consider two two-sample tests for continuous variables: t-test and Wilcoxon rank test.

 

18.2.1 Two-Sample -test

Suppose that, for group k(=1,2), Xk1,..., Xk,nk are independent and identically distributed (IID) normal random variables with mean μk and variance σ2. We want to test H0 : μ1 = μ2 vs. Ha:μ1μ2Ha:μ1μ2. ˉXk=n-1knki=1XkiX¯¯¯k=n1knki=1Xki Let denote the sample mean for group k, and

s2p=n1i=1(X1i-ˉX1)2+n2i=1(X2i-ˉX2)2n1+n2-2s2p=n1i=1(X1iX¯¯¯1)2+n2i=1(X2iX¯¯¯2)2n1+n22

denote the pooled sample variance. Then, under H0,

T=ˉX1-ˉX2spn-11+n-12T=X¯¯¯1X¯¯¯2spn11+n12

follows the t-distribution with n1 + n2 − 2 degrees of freedom. Hence, given type I error probability α, we reject H0 if |T|>tn1+n2-2,1-α/2T>tn1+n22,1α/2, where tv,γ is the 100γ-th percentile of the t-distribution with v degrees of freedom.

If n1 and n2 are large, then we do not require the normal distribution assumption, and the critical value from the t-test, tn1+n2-2,1-α/2tn1+n22,1α/2 , can be approximated by the 100γ-th percentile of the standard normal distribution, z1−α/2. How large the sample sizes should be for the large approximation depends on how close the distribution. of the observations for each group is to a normal distribution. To improve the normality, we often apply a transform to the raw data, such as a log-transformation for positive variables (Carroll and Ruppert, 1988). We derive the sample size formula based on the large sample approximation.

We consider a specific alternative hypothesis Ha : |μ1μ2|= ∆σ. Note that ∆ = σ-1 (μ1μ2) denotes a standardized effect size under the alternative hypothesis. Without loss of generality, we assume that μ1 > μ2 under Ha. Then, under Ha,

TˉX1-ˉX2-σΔσn-11+n-12+Δn-11+n-12TX¯¯¯1X¯¯¯2σΔσn11+n12+Δn11+n12 =Z+Δnr1r2,=Z+Δnr1r2,

where Z : N(0, 1), rk = nk/n, and n = n1 + n2. For a given n, the power is calculated as

1-β=ˉϕ(z1-α/2-Δnr1r2),1β=ϕ¯(z1α/2Δnr1r2),                                    (1)

where ˉϕ(z)=1-ϕ(z)ϕ¯(z)=1ϕ(z) and ϕ(z)=P(Zz)ϕ(z)=P(Zz) is the cumulative distribution function of the standard normal distribution, N(0, 1). By solving (1) with respect to n, we obtain the required total sample size as

n=(z1-α/2+z1-β)2r1r2Δ2,n=(z1α/2+z1β)2r1r2Δ2,

or nk = n х rk for group k. For one-sided testing, replace α/2 with α. This sample size is based on a large sample approximation to a t-distribution, so that this formula underestimates the sample size for small sample sizes. If the final sample size by the formula based on normal approximation is smaller than 30, we may consider increasing it by about 10%.

In summary, the sample size calculation for a two-sample t-test is conducted as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) Δ=σ-1|μ1-μ2|Δ=σ1μ1μ2, standardized effect size

(b) the prevalence of group k (r1+r2=1)(r1+r2=1)

• Calculate the sample size

n=(z1-α/2+z1-β)2r1r2Δ2.n=(z1α/2+z1β)2r1r2Δ2.

Example 18.1

Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α= 0.05 and power 1 − β = 0.9. Then z1-α/2=1.96z1α/2=1.96 and z1-β=1.282z1β=1.282. Assuming an equal proportion of two groups in the population (that is r1=r2=0.5r1=r2=0.5), we obtain the required sample size . PROC POWER in SAS/STAT can directly handle this scenario, as shown in the following code:

Program 18.1 SAS Code for Two-Sample t-Test

proc power;
     twosamplemeans
     nfractional
     meandiff = 0.5              /* mean difference                */
     stddev  = 1                 /* sigma                          */
     groupweights = (0.5 0.5)    /* r1 r2                          */
     sides = 2                   /* 1: one sided 2: 2: two-sided   */
     power =0.9                  /* power                          */
     alpha =0.05                 /* alpha                          */
     ntotal=.;                   /* Sample Size                    */
run;

 

Output from Program 18.1

The POWER Procedure
Two-sample t Test for Mean Difference

Fixed Scenario Elements

 

Distribution Normal
Method Exact
Number of Sides 2
Alpha 0.05
Mean Difference 0.5
Standard Deviation 1
Group 1 Weight 0.5
Group 2 Weight 0.5
Nominal Power 0.9
Null Difference 0

Computed Ceiling N Total

 

Fractional
N Total
Actual
Power  
Ceiling
N Total  
170.062568 0.902 171

 

18.2.2 Wilcoxon Rank Sum Test

It is known that -tests are sensitive to outliers. The Wilcoxon rank sum test (WRST) has been widely used as a robust test. We assume that for group k(=1,2)k(=1,2), Xk1,...,Xk,nkXk1,...,Xk,nk are IID random variables from a distribution with cumulative distribution function F(x-θk)=P(Xkix)F(xθk)=P(Xkix). For Δ=θ1-θ2Δ=θ1θ2, we want to test H0:Δ=0H0:Δ=0vs. Ha:Δ0Ha:Δ0. Mann and Whitney (1947) propose to use W=(n1n2)-1n1i=1n2j=1I(X1i>X2j)W=(n1n2)1n1i=1n2j=1I(X1i>X2j) for testing the hypothesis. The expected value of W is P(X1i>X2j)P(X1i>X2j), the probability that a randomly chosen measurement from group 1 is greater than a randomly chosen measurement from group 2. Thus, the test statistic will be close to 1/2 if H0 is true, and closer to 0 or 1 if Ha is true.

For large n(=n1 + n2), we reject if the absolute value of

T=W-ν0σ0T=Wν0σ0

is larger than z1−α/2, where v0 = 1/2 and σ20=(n+1)/(12n1n2)σ20=(n+1)/(12n1n2) are the mean and variance of W under H0, respectively.

Let f(x)=F/xf(x)=F/x denote the probability density function of F(x). The appendix shows that, under Ha, W has mean

νa=-F(x+Δ)f(x)dxνa=F(x+Δ)f(x)dx

and variance

σ2a=σ21n1+σ22n2,σ2a=σ21n1+σ22n2,

where

σ21=-F2(x+Δ)f(x)dx-{-F(x+Δ)f(x)dx}2,σ21=F2(x+Δ)f(x)dx{F(x+Δ)f(x)dx}2,

and

σ22=-F2(x-Δ)f(x)dx-{-F(x-Δ)f(x)dx}2.σ22=F2(xΔ)f(x)dx{F(xΔ)f(x)dx}2.

Then,

T=Zσaσ0+νa-ν0σ0,T=Zσaσ0+νaν0σ0,

where Z=(W-νa)/σaZ=(Wνa)/σa is an asymptotically random variable. Given power 1 − β,

1-β=P(|T|>z1-α/2|Ha)=P(Zσaσ0+|νa-ν0|σ0>z1-α/2).1β=P(T>z1α/2Ha)=P(Zσaσ0+|νaν0|σ0>z1α/2).

=ˉϕ(σ0σa(z1-α/2-|νa-ν0|σ0)),=ϕ¯(σ0σa(z1α/2|νaν0|σ0)),

where ˉϕ(z)=P(Z>z)ϕ¯(z)=P(Z>z). By replacing nk with nrk in σ20σ20 and σ2aσ2a, and solving with respect to n, we obtain the required sample size

n=112r1r2{z1-α/2+z1-β12(r2σ21+r1σ22)νa-1/2}2.n=112r1r2{z1α/2+z1β12(r2σ21+r1σ22)νa1/2}2.

Although the testing does not require the specification of the distribution function F(x), a sample size calculation does. The alternative hypothesis is specified by two distributions with a location shift. The assumption of a location shift model is not required in order to carry out the rank sum test, but it is needed if we want to make inferences about medians or means. It is, of course, needed in the sample size calculation as presented here.

While a t-test requires a finite variance of the observations, WRST does not. For example, if has a Cauchy distribution, we can use WRST, but not a t-test.

A sample size calculation for WRST assuming normal distributions can be described as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) Δ=σ-1(μ1-μ2)Δ=σ1(μ1μ2), standardized effect size

(b) rk = the prevalence of group k (r1+r2=1)(r1+r2=1)

• Calculate

νa=-ϕ(x+Δ)ϕ(x)dxνa=ϕ(x+Δ)ϕ(x)dx

σ21=-ϕ2(x+Δ)ϕ(x)dx-{-ϕ(x+Δ)ϕ(x)dx}2σ21=ϕ2(x+Δ)ϕ(x)dx{ϕ(x+Δ)ϕ(x)dx}2

and

σ22=-ϕ2(x-Δ)ϕ(x)dx-{-ϕ(x-Δ)ϕ(x)dx}2,σ22=ϕ2(xΔ)ϕ(x)dx{ϕ(xΔ)ϕ(x)dx}2,

where ϕ(x)ϕ(x) and ϕ(x)ϕ(x) are the probability density function and the cumulative distribution function of , respectively.

• Calculate the sample size

n=112r1r2{z1-α/2+z1-β12(r2σ21+r1σ22)νa-1/2}2.n=112r1r2{z1α/2+z1β12(r2σ21+r1σ22)νa1/2}2.

Example 18.2

Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α = 0.05 and power 1 − β = 0.9. Then z1-α/2=1.96z1α/2=1.96 and z1-β=1.282z1β=1.282. Assuming an equal proportion of two groups in the population (that is, r1=r2=0.5r1=r2=0.5) , we obtain the required sample size n = 177. Note that the two-sample t-test requires a slightly smaller n = 171, but, in case there are outliers in the final data, the Wilcoxon rank test may be more powerful.

Program 18.2 SAS Code for Wilcoxon Rank Sum Test

%macro  WilcoxonRankSumTest( delta=,    /* Standardized effect size     */
                             r1 =  ,    /* prevalence of group 1        */
                             r2 =  ,    /* prevalence of group 2        */
                             sides=,    /* number of test sides: 1 or 2 */
                             alpha=,    /* alpha                        */
                             power=     /* power                        */
                             );
proc iml;
  start CDFPDFDELTA(t);
      m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
  finish;
  start CDF2PDFDELTA(t);
      m=t+δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
  finish;

  start CDFPDF_DELTA(t);
      m=t-δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
  finish;
  start CDF2PDF_DELTA(t);
      m=t-δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
  finish;
  start s1(t);
      m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
  finish;

  interval = .M || .P;
  call quad(nu1,"CDFPDFDELTA",   interval);
  call quad(s1, "CDF2PDFDELTA",  interval);
  call quad(nu2,"CDFPDF_DELTA",  interval);
  call quad(s2, "CDF2PDF_DELTA", interval);

  sigma1 = s1-nu1**2; sigma2 = s2-nu2**2;
  p1=1-&alpha/&sides; z_alpha = probit(p1);
  p2=&power; z_beta  = probit(p2);
  n =
(1/(12*&r1*&r2))*((z_alpha+z_beta*sqrt(12*(&r2*sigma1+&r1*sigma2)))/(nu1-
0.5))**2;
  delta=δr1 =&r1; r2=&r2;sides =&sides;alpha=αpower=&power;
  print  'Sample Size';
  print 'Wilcoxon Rank Sum Test for Mean Difference';
  print delta r1 r2 sides alpha power;
  print n;
quit;
run;
%mend WilcoxonRankSumTest;

/*---------------------- Run the macro for Example 2. -------------------*/

%WilcoxonRankSumTest(delta=0.5, r1 = 0.5, r2 = 0.5, sides=2, alpha=0.05, power=0.9);

 

Output from Program 18.2

Sample Size
Wilcoxon Rank Sum Test for Mean Difference

 

delta r1 r2 sides alpha power
0.5 0.5 0.5 2 0.05 0.9

n
176.41709

 

18.3 Binary Variables

In this section, we investigate two-sample tests for binary outcome variables with or without stratification. We use a weighted Mantel-Haenszel test for stratified analysis to adjust for covariates that are unbalanced between two groups.

 

18.3.1 Two-Sample Test on a Binary Outcome

Suppose that, for group k(=1, 2), Xk denotes the number of responders from nk independent subjects. If group k has a response probability pk, then Xk is a binomial distribution with nk independent trials and response probability pk. We want to test H0:p1=p2H0:p1=p2 vs. Ha:p1p2Ha:p1p2. Let ˆpk=Xk/nkpˆk=Xk/nk and ˆp=(X1+X2)/(n1+n2)pˆ=(X1+X2)/(n1+n2) denote the sample proportion for group k and the pooled data, respectively. Then, for large n1 and n2 (say, n1,n2 > 30) ,

T=ˆp1-ˆp2ˆpˆq(n-11+n-12)T=pˆ1pˆ2pˆqˆ(n11+n12)

follows the standard normal distribution under H0, where ˆq=1-ˆpqˆ=1pˆ. Hence, given type I error probability α, we reject H0 if |T|>z1-α/2T>z1α/2. It is easy to show that T2 is identical to the chi-squared test with 1 degree of freedom for a 2 х 2 table.

Let p1, p2 denote the response probability under a specific alternative hypothesis for sample size calculation, and Δ=p1-p2Δ=p1p2. Then, for large n(=n1+n2)n(=n1+n2), ˆppˆ converges to ˉp=r1p1+r2p2p¯=r1p1+r2p2, where rk=nk/nrk=nk/n. Under Ha,

Tˆp1-ˆp2-Δˉpˉq(n-11+n-12)+Δˉpˉq(n-11+n-12)Tpˆ1pˆ2Δp¯q¯(n11+n12)+Δp¯q¯(n11+n12)

=Z+Δnr1r2ˉpˉq,=Z+Δnr1r2p¯q¯,

where Z : Z(0, 1) and ˉq=1-ˉpq¯=1p¯. Hence, given n, the power is calculated as

1-β=ˉϕ(z1-α/2-Δnr1r2),1β=ϕ¯(z1α/2Δnr1r2),                                       (2)

By solving (2) with respect to n, we obtain the required total sample size as

n=(z1-α/2+z1-β)2r1r2Δ2,n=(z1α/2+z1β)2r1r2Δ2,

In summary, the sample size calculation for two-sample binomial proportions is conducted as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) p1, p2 = binomial proportions under Ha

(b) rk the prevalence of group k (r1 + r2 = 1)

• Calculate the sample size

n=(z1-α/2+z1-β)2ˉpˉqr1r2(p1-p2)2,n=(z1α/2+z1β)2p¯q¯r1r2(p1p2)2,

where ˉp=r1p1+r2p2p¯=r1p1+r2p2and ˉq=1-ˉpq¯=1p¯.

Example 18.3

Suppose that we want to detect a difference of p1 = 0.4 and p2 = 0.5 with two-sided α = 0.05 and power 1 − β = 0.9. Assuming an equal proportion of two groups in the population (that is, r1 = r2 = 0.5), we obtain the required sample size n = 10.38.

Program 18.3 SAS Code for Two-Sample Binomial Proportions

proc power;
     twosamplefreq  test=pchi
     groupproportions =(0.4 0.5) /* binomial proportions to be detected  */
     groupweights = (1 1)        /* (w1, w2):Weight of two groups,(r1,r2)*/
     sides = 2                   /* 1: one sided 2: 2: two-sided         */
     power =0.9                  /* power                                */
     alpha =0.05                 /* alpha                                */
     ntotal=.;                   /* sample size                         */
    run;

 

Output from Program 18.3

  The POWER Procedure
  Pearson Chi-square Test for Two Proportions

  Fixed Scenario Elements

 

Distribution Asymptotic normal
Method Normal approximation
Number of Sides 2
Alpha 0.05
Group 1 Proportion 0.4
Group 2 Proportion 0.5
Group 1 Weight 1
Group 2 Weight 1
Nominal Power 0.9
Null Proportion Difference 0

  Computed N Total

 

Actual
Power
     N
Total
0.901 1038

 

18.3.2 Weighted Mantel-Haenszel Test

In an observational study, patients are not randomly assigned to treatment groups with equal probability. Instead, the probability of assignment varies from patient to patient, possibly depending on the patient’s baseline covariates. This often results in non-comparable treatment groups due to imbalance of the baseline covariates and consequently invalidates the standard methods commonly employed in data analysis. To overcome this problem, Rosenbaum and Rubin (1983, 1984) developed the propensity score method.

Suppose that there are j strata formed by propensity score stratification. Let n denote the total sample size and nj the sample size in stratum j Jj=1nj=nJj=1nj=n. The data on each subject comprise the response variable x = 1 for response and 0 for no response and j and k for the stratum and treatment group, respectively, to which the subject is assigned (1jJ;k=1,21jJ;k=1,2). Frequency data in stratum j can be described as follows:

  Group  
Response 1 2 Total
Yes xj11 xj12 xj1
No xj21 xj22 xj2
Total nj1 nj2 nj

Let Oj=xj11Oj=xj11, Ej=nj1xj1/njEj=nj1xj1/nj, and

Vj=nj1nj2xj1xj2n2j(nj-1).Vj=nj1nj2xj1xj2n2j(nj1).

Then, the weighted Mantel-Haenszel (WMH) test is given by

T=Jj=1ˆwj(Oj-Ej)Jj=1ˆw2jVj,T=Jj=1wˆj(OjEj)Jj=1wˆ2jVj,

where the weights ˆwjwˆj converge to a constant wj as nn. The weights are ˆwj=1wˆj=1 for the original Mantel-Haenszel test and ˆwj=ˆqj=xj2/njwˆj=qˆj=xj2/nj for the statistic proposed by Gart (1985).

Let aj = nj/n denote the allocation proportion for stratum (Jj=1aj=1)(Jj=1aj=1), and bjk=njk/njbjk=njk/nj denote the allocation proportion for group k within stratum j (bj1+bj2=1)(bj1+bj2=1). Let pjk denote the response probability for group k in stratum j and qjk=1-pjkqjk=1pjk. Under H0:pj1=pj2,1jJH0:pj1=pj2,1jJ, T is approximately N(0, 1). The optimal weights maximizing the power depend on the allocation proportions {(aj,b1j,b2j),j=1,...,J}{(aj,b1j,b2j),j=1,...,J} and effect sizes (pj1-pj2,1,...,J)(pj1pj2,1,...,J) under H1.

In order to calculate the power of WMH, we have to derive the asymptotic distribution of Jj=1ˆwj(Oj-Ej)Jj=1wˆj(OjEj) and the limit of Jj=1ˆw2jVjJj=1wˆ2jVj under H1. We assume that the success probabilities satisfy for under . Note that (pjk,1jJ,j=1,2)(pjk,1jJ,j=1,2) satisfy pj2qj1/(pj1qj2)=ϕpj2qj1/(pj1qj2)=ϕ for ϕ1ϕ1 under H1. Note that a constant odds ratio across strata holds if there exists no interaction between the treatment and the propensity score when the binary response is regressed on the treatment indicator and the propensity score using a logistic regression. The following derivations are based on H1. It can be verified that

   Oj-Ej=nj1nj2nj(ˆpj1-ˆpj2)OjEj=nj1nj2nj(pˆj1pˆj2)

   =nj1nj2nj(ˆpj1-pj1-ˆpj2+pj2)+nj1nj2nj(pj1-pj2)=nj1nj2nj(pˆj1pj1pˆj2+pj2)+nj1nj2nj(pj1pj2)

   =najbj1bj2(ˆpj1-pj1-ˆpj2+pj2)+najbj1bj2(pj1-pj2).=najbj1bj2(pˆj1pj1pˆj2+pj2)+najbj1bj2(pj1pj2).

Thus, under H1, Jj=1ˆwj(Oj-Ej)Jj=1wˆj(OjEj) is approximately normal with mean nΔ and variance nσ21nσ21, where

   Δ=Jj=1wjajbj1bj2(pj1-pj2)Δ=Jj=1wjajbj1bj2(pj1pj2)

   =(1-Φ)Jj=1wjajbj1bj2pj1qj1qj1+Φpj1=(1Φ)Jj=1wjajbj1bj2pj1qj1qj1+Φpj1

and

   σ21=n-1Jj=1w2jn2j1n2j2n2j(pj1qj1nj1+pj2qj2nj2)σ21=n1Jj=1w2jn2j1n2j2n2j(pj1qj1nj1+pj2qj2nj2)

   =Jj=1w2jajbj1bj2(bj2pj1qj1+bj1pj2qj2).=Jj=1w2jajbj1bj2(bj2pj1qj1+bj1pj2qj2).

Also under H1, we have

Jj=1w2jVj=nσ20+op(n),

j=1Jw2jVj=nσ20+op(n),

where

σ20=Jj=1w2jajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).

σ20=j=1Jw2jajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).

Hence, the power of WMH is given as

1-β=P(|T|>z1-α/2|H1)

1β=P(T>z1α/2H1)

=P(σ1σ0Z+n|Δ|σ0>z1-α/2)

=P(σ1σ0Z+n|Δ|σ0>z1α/2)

=ˉΦ(σ0σ1z1-α/2-n|Δ|σ1),

=Φ¯¯(σ0σ1z1α/2n|Δ|σ1),

where Z is a standard normal random variable and ˉΦ(z)=P(Z>z)Φ¯¯(z)=P(Z>z) . Thus, the sample size required for achieving a desired power of 1 − β can be obtained as

n=(σ0z1-α/2+σ1z1-β)2Δ2.

n=(σ0z1α/2+σ1z1β)2Δ2.
                        (3)

Following Jung, Chow, and Chi (2007), the sample size calculation for the weighted Mantel-Haenszel test can be carried out as follows:

1. Specify the input variables

- Type I and II error probabilities (α β).

- Success probabilities for group 1 p11,...pJ1, and the odds ratio Φ under H1. Note that pj2=Φpj1/(qj1+Φpj1)pj2=Φpj1/(qj1+Φpj1) .

- Incidence rates for the strata, (aj, j = 1,...,J). (Yue [2007] proposes to use aj1/Jaj1/J.

- Allocation probability for group 1 within each stratum, (bj1,j=1,...,J)(bj1,j=1,...,J).

2. Calculate n by

n=(σ0z1-α/2+σ1z1-β)2Δ2,

n=(σ0z1α/2+σ1z1β)2Δ2,

    where

Δ=Jj=1ajbj1bj2(pj1-pj2)Δ=Jj=1ajbj1bj2(pj1pj2)

σ21=Jj=1ajbj1bj2(bj2pj1qj1+bj1pj2qj2)σ21=Jj=1ajbj1bj2(bj2pj1qj1+bj1pj2qj2)

σ20=Jj=1ajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).σ20=Jj=1ajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).

Example 18.4

Suppose that we want to compare the response probabilities between control (k = 1) (and experimental (k = 2) (groups. We consider partitioning the combined data into J = 5 strata, and the allocation proportions are projected as and (a1,a2,a3,a4,a5)=(.15,.15,.2,.25,.25)(a1,a2,a3,a4,a5)=(.15,.15,.2,.25,.25) and (b11,b21,b31,b41,b51)=(.4,.4,.5,.6,.6)(b11,b21,b31,b41,b51)=(.4,.4,.5,.6,.6). Also, suppose that the response probabilities for the control group are given as (p11,p21,p31,p41,p51)=(.5,.6,.7,.8,.9)(p11,p21,p31,p41,p51)=(.5,.6,.7,.8,.9), and we want to calculate the sample size required for a power of 1 − β = 0.8 to detect an odds ratio of Φ=2Φ=2 using two-sided α = 0.05. For Φ=2Φ=2, the success probabilities for the experimental group are given as (p12,p22,p32,p42,p52)=(.6667,.7500,.8235,.8889,.9474)(p12,p22,p32,p42,p52)=(.6667,.7500,.8235,.8889,.9474). Under these settings, by (3), we need n = 447 for Mantel-Haenszel.

Program 18.4 SAS Code for Weighted Mantel-Haenszel Test with Strata

%macro WMHTestwithStrata ( J = , /* number of strata                         */
                          inA= , /* incidence rates for the strata           */
                          inB= , /* allocation probability for control group */
                          inP1=, /* success probability for control group    */
                          phi =, /* odds ratio under H1                      */
                          power=, /* power                                   */
                          alpha=, /* alpha                                   */
                          sides=  /* 1: one-sided test 2: Two-sided test     */
                         );

 proc iml;
   %let K = 2; /* two groups */
   A=&inA; B =&inB;P1=&inP1;
   P2 =J(&J,1,0);Q1 =J(&J,1,0); Q2 =J(&J,1,0);
   do j=1 to &J;
      Q1[j]=1-P1[j];
      P2[j]=&phi*P1[j]/(Q1[j]+&phi*P1[j]);
      Q2[j]=1-P2[j];
   end;
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
   z_p2=&power; z_beta = probit(z_p2);
   delta = 0; s0_sq= 0; s1_sq= 0;

   do j=1 to &J;
    delta = delta+A[j]*B[j]*(1-B[j])*(P1[j]-P2[j]);
    s1_sq= s1_sq+A[j]*B[j]*(1-B[j])*((1-B[j])*P1[j]*Q1[j]+B[j]*P2[j]*Q2[j]);
    s0_sq= s0_sq+A[j]*B[j]*(1-B[j])*(B[j]*P1[j]+(1-B[j])*P2[j])*(B[j]*Q1[j]+(1-B[j])*Q2[j]);
   end;
   n = (1/(delta**2))*((sqrt(s0_sq)*z_alpha+sqrt(s1_sq)*z_beta)**2);

   print 'Sample Size Calculation';
   print 'Weighted Mantel-Haenszel Test with Strata';
   alpha = α power = &power; phi= φ sides = &Sides;
   print  alpha power phi sides ;
   print A B P1 P2;
   print delta s0_sq s1_sq;
   print n;
  quit;
  run;
 %mend WMHTestwithStrata;

 /*---------------- Run the macro for Example 18.4. ---------------------*/

 %WMHTestwithStrata(
   J = 5 ,                                  /* number of strata        */
   inA= %str({0.15, 0.15, 0.2, 0.25, 0.25}), /*Incidence Rates for Strata*/
   inB= %str({0.4 , 0.4, 0.5, 0.6, 0.6}),  /* Allocation Probability  */
   inP1=%str({0.5 , 0.6, 0.7, 0.8, 0.9}),  /* Success Probability     */
   phi = 2,                                 /* odds Ratio Under H1     */
   power =0.8,                              /* Power                   */
   alpha = 0.05,                            /* Alpha                   */
   sides = 2                                /* Two-sided test          */
  );

 

Output from Program 18.4

Sample Size Calculation

Weighted Mantel-Haenszel Test with Strata

 

alpha
0.05
power
0.8
phi
2
sides
2
 
A B P1 P2
0.15 0.4 0.5 0.6666667
0.15 0.4 0.6 0.75
0.2 0.5 0.7 0.8235294
0.25 0.6 0.8 0.8888889
0.25 0.6 0.9 0.9473684
delta s0_sq s1_sq
-0.025752 0.0381275 0.0367178

 

n

446.21501

 

18.4 Two-Sample Log-Rank Test for Survival Data

Suppose that nk subjects are accrued to a study from group k(=1,2). For subject i(=1,...,nk) in group k(=1,2), let Tki denote the survival variable (that is, time to an event of interest), with marginal cumulative hazard function Λk(t)Λk(t). We want to test the null hypothesis,

H0:Λ1(t)=Λ2(t)forallt0H0:Λ1(t)=Λ2(t)forallt0

against the alternative hypothesis,

Ha:Λ1(t)Λ2(t)forsomet0.Ha:Λ1(t)Λ2(t)forsomet0.

Because a cumulative hazard function uniquely determines the distribution, H0 implies that (T1i,i=1,...,n1)(T1i,i=1,...,n1) and (T2i,i=1,...,n2)(T2i,i=1,...,n2) have the same distributions.

Due to loss to follow up or termination of study before all subjects experience events, survival times are censored for some subjects. Let Cki denote the censoring time for the subject i in group k. Then, we observe {(Xki,δki),1ink,k=1,2}{(Xki,δki),1ink,k=1,2}, where Xki=min(Tki,Cki)Xki=min(Tki,Cki) and δki=I(TkiCki)δki=I(TkiCki). We assume that Cki and T1i are independent within each group.

The log-rank test (Peto and Peto, 1972) is given by

W=0Y1(t)Y2(t)Y1(t)+Y2(t){dˆΛ1(t)-dˆΛ2(t)},W=0Y1(t)Y2(t)Y1(t)+Y2(t){dΛˆ1(t)dΛˆ2(t)},

where ˆΛk(t)=t0Yk(s)-1dNk(s)Λˆk(t)=t0Yk(s)1dNk(s) is the Nelson-Aalen estimator (Nelson, 1969; Aalen, 1978) of Λk(t)Λk(t), Nk(t)=nki=1Nki(t)Nk(t)=nki=1Nki(t), Nki(t)=δkiI(Xkit)Nki(t)=δkiI(Xkit), Yk(t)=nki=1Yki(t)Yk(t)=nki=1Yki(t), and Yki(t)=I(Xkit)Yki(t)=I(Xkit). For large n with and nk/n=rknk/n=rk, W/ˆσW/σˆ is approximately normal with mean 0 and variance 1 under H0, where

ˆσ2=0Y1(t)Y2(t){Y1(t)+Y2(t)}{dN1(t)+dN2(t)},σˆ2=0Y1(t)Y2(t){Y1(t)+Y2(t)}{dN1(t)+dN2(t)},

Refer to Fleming and Harrington (1991), for example. We reject H0 if the absolute value of W/ˆΣW/Σˆ is larger than z1-α/2z1α/2 .

For sample size and power calculation, we assume a proportional hazards model Δ=λ2(t)/λ1(t)Δ=λ2(t)/λ1(t). The power of the log-rank test depends on the number of events, rather than the number of patients. By Rubinstein and colleagues (1981), the number of events D1 and D2 in the two groups, required for a two-sided α test to detect a hazard ratio of Δ(<1) with power 1 − β, is given as

(logΔz1-α/2+z1-β)2=D-11+D-12.(logΔz1α/2+z1β)2=D11+D12.                          (4)

In order to calculate Dk, we need to specify the survival distributions under Ha and the common censoring distribution. Suppose that the subjects are accrued at a constant rate for a period of A and followed for an additional period of B after the completion of accrual. The total study period is A + B. Then, assuming no loss to follow up, the censoring distribution is uniform between B and A + B, as in the following:

G(t)=P(Ckit)={1ift<B1+(B-t)/AifBtA+B0otherwise.G(t)=P(Ckit)=11+(Bt)/A0ift<BifBtA+Botherwise.

One of the most popular survival models in sample size calculation is the exponential distribution,

Sk(t)=P(Tkit)=exp(-λkt)Sk(t)=P(Tkit)=exp(λkt)

for group k. Then, the probability that a subject in arm k experiences an event is given as

dk=P(TkiCi)=0Sk(t)dG(t)=1-exp(-λkB){1-exp(-λkA)}Aλk,dk=P(TkiCi)=0Sk(t)dG(t)=1exp(λkB){1exp(λkA)}Aλk,

so that we have Dk=nkdk=nrkdkDk=nkdk=nrkdk. By plugging this in (4) and solving with respect to n, we obtain

n=(1r1d1+1r2d2)(z1-α/2+z1-βlogΔ)2.n=(1r1d1+1r2d2)(z1α/2+z1βlogΔ)2.

We assumed no loss to follow up in calculating dk, but this assumption can be easily loosened by assuming a distribution for the time to loss to follow up.

In summary, a sample size is calculated as follows:

• In addition to type I error probability α and power 1 − β, specify

(a) λ1,λ2=λ1,λ2= hazard rates under Ha (Δ = λ2/λ1)

(b) rk = the prevalence of group k (r1 + r2 = 1)

(c) A = accrual period, B = additional follow-up period

• Calculate the probability of an event for a subject in group k(=1,2)

dk=1-exp(-λkB){1-exp(-λkA)}Aλk,dk=1exp(λkB){1exp(λkA)}Aλk,

• Calculate the sample size

n=(1r1d1+1r2d2)(z1-α/2+z1-βlogΔ)2.n=(1r1d1+1r2d2)(z1α/2+z1βlogΔ)2.

 

Example 18.5

Suppose that the control group (k = 1) has a median survival time of μ1=3μ1=3 years, and the experimental group will be considered acceptable if it extends the median survival by 50% (that is, μ2=4.5μ2=4.5 years). Under the exponential survival model, the annual hazard rates are λ1=0.231λ1=0.231 and λ2=0.154λ2=0.154 (Δ = 0.667). Also, suppose that (r1,r2) = (0.3,0.7), and patients were uniformly accrued for A = 3 years and the final analysis is conducted B = 2 years after the completion of accrual. Then, we have (d1,d2) = (0.5455,0.4115), n = 613 and (n1 = 184,n2 = 429), and (D1,D2) = (100,177). PROC POWER in SAS/STAT(Program 18.5.2) also provides sample sizes for the log-rank test based on Lakatos (1988), which, unlike our formula, calculates the limit of the variance estimator of the log-rank test under the null hypothesis. As demonstrated here, it gives a slightly smaller sample size, n = 593, under the design setting.

Program 18.5.1 SAS Code for Two-Sample Log-Rank Tests for Survival Data

%macro SS_TwoSmpleLogRank(
          Accrual =,   /* accrual period                          */
          Follow  =,   /* additional follow-up period             */
          inR     =,   /* group allocation proportion(a1, a2)     */
          inLambda=,   /* hazard rates under the alternative      */
          alpha   =,   /* alpha                                   */
          power   =,   /* power                                   */
          sides   =    /* 1: One-sided test 2: Two-sided test     */
                     );
 proc iml;
  %let K = 2;  r = &inR; Group = J(&K,1,0); lambda = &inLambda;
  delta = lambda[2]/lambda[1]; d_prob = J(&K, 1,0);
  n = J(&K, 1,0); D = J(&K, 1,0);
  do i=1 to &K;
    group[i] = i;
    d_prob[i]=1-(exp(-lambda[i]*&Follow))*(1-exp(-lambda[i]*&Accrual))
              /(&Accrual*lambda[i]);
  end;
  z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
  z_p2=&power; z_beta = probit(z_p2);
  Total =
int((1/(r[1]*d_prob[1])+1/(r[2]*d_prob[2]))*((z_alpha+z_beta)/log(delta))**
2)+1;
  do i=1 to &K; n[i]= r[i]*Total; D[i]= d_prob[i]*n[i]; end;
  print ' Sample Size Calculation';
  print 'for Two-Sample Log-Rank Test for Survival Data';
  Accrual=&Accrual; Follow_Up= &Follow;
  print 'Accrual Period : ' &Accrual;
  print ' Follow Up period : ' &Follow;
  print Group lambda delta r d_prob;
  print Group n D;
  print Total;
  quit;
 run;
%mend SS_TwoSmpleLogRank;

/*-------------- Run the macro for Example 18.5.1 ----------------------------*/

%SS_TwoSmpleLogRank(
  Accrual =3,                  /* accrual period                         */
  Follow  =2,                  /* additional follow-up period            */
  inR     =%str({0.3, 0.7}),   /* group allocation proportion(r1, r2)    */
  inLambda=%str({0.231, 0.154}),/* hazard rates under the alternative    */
  alpha   =0.05,               /* alpha                                  */
  power   =0.9,                /* power                                  */
  sides   =2                   /* two-sided test                         */
      );

 

Output from Program 18.5.1

 

Sample Size Calculation
for Two-Sample Log-Rank Test for Survival Data

 

Accrual Period : 3
Follow Up period : 2

 

GROUP LAMBDA DELTA R D_PROB
1 0.231 0.6666667 0.3 0.5455053
2 0.154   0.7 0.411467 

 

GROUP N D
1 183.9 100.31843
2 429.1 176.56049

TOTAL
613

Program 18.5.2 SAS Code (PROC POWER) for Example 18.5

      proc power;
          twosamplesurvival test=logrank
          nfractional
          groupmedsurvtimes = 3 | 4.5
          accrualtime = 3
          followuptime = 2
          groupweights = (0.3,0.7)
          alpha = 0.05
          sides=2
          power = 0.9
          ntotal = .
         ;
      run;

 

Output from Program 18.5.2

The POWER Procedure
Log-Rank Test for Two Survival Curves

Fixed Scenario Elements

 

Method Lakatos normal approximation
Form of Survival Curve 1 Exponential
Form of Survival Curve 2 Exponential
Number of Sides 2
Accrual Time 3
Follow-up Time 2
Alpha 0.05
Group 1 Median Survival Time 3
Group 2 Median Survival Time 4.5
Group 1 Weight 0.3
Group 2 Weight 0.7
Nominal Power 0.9
Number of Time Sub-Intervals 12
Group 1 Loss Exponential Hazard 0
Group 2 Loss Exponential Hazard 0

 

Computed Ceiling N Total

 

Fractional
N Total
Actual
Power
Ceiling
N Total
592.792138 0.900 593

 

18.5 Two-Sample Longitudinal Data

In a longitudinal study, we measure the outcome of research interest repeatedly from each subject over a time period. When there are two groups of subjects, one popular primary objective is the between-group comparison of change rate in the expected value of a response variable over time. Typically, the repeated measurements within each subject are correlated. Because of the robustness to possible misspecification of the correlation structure among the repeated measurements, the generalized estimating equation (GEE) method has been one of the most popular methods to fit the regression models and to test on the change rates (Liang and Zeger, 1986). In this section, we discuss sample size estimation methods for such testing. We consider cases where the response variable is continuous or dichotomous.

 

18.5.1 Generalized Estimating Equations

Suppose that there are nk subjects in treatment group k(=1,2), n1,n2 = n. Let, for group k, Nk=nki=1mkiNk=nki=1mki denote the total number of observations and rk = nk/n the allocation proportion (r1 + r2 = 1). For subject i(i = 1,...,nk) in group k, let ykij denote the outcome variable at measurement time tkij (j = 1,..., mki) with μkij=E(ykij)μkij=E(ykij) that is expressed as

g(μkij)=ak+bktkij,g(μkij)=ak+bktkij,

where g(.) is a known link function. In other words, we assume that there exists a link function g(.) linearizing the trajectory in response over time. To simplify the discussions, we use the identity link g(μ)=μg(μ)=μ for continuous outcome variables and the logit link g(μ)=log{μ/(1-μ)}g(μ)=log{μ/(1-μ)} for binary outcome variables, but the generalization to the use of other links is simple.

The coefficient bk represents the change rate per unit time in mean response if y is continuous and the change rate in log-odds if y is binary. The measurement times may vary subject by subject due to missing measurements, patients’ visits for measurements at unscheduled times, loss to follow up, or other causes. In this chapter, we assume that any missing data is missing completely at random (Rubin, 1976).

The repeated measurements (ykij,1jmki)(ykij,1jmki) within each subject tend to be correlated. However, the true correlation structure is usually unknown or of secondary interest. Liang and Zeger (1986) proposed a consistent estimator, called the GEE estimator, based on a working correlation structure. Using either the identity or logit link, the GEE estimator (ˆak,ˆbk)(aˆk,bˆk) based on the working independence structure solves Uk(a,b) = 0, where

Uk(a,b)=1nknki=1mkij=1{ykij-μkij(a,b)}(1tkij)Uk(a,b)=1nknki=1mkij=1{ykijμkij(a,b)}(1tkij)

and μkij(a,b)=g-1(a+btkij)μkij(a,b)=g1(a+btkij).

18.5.1.1 Continuous Outcome Variable Case

In the continuous outcome variable case, we have a closed form solution to Uk (a,b) = 0

ˆbk=nki=1mkij=1(tkij-ˉtk)ykijnki=1mkij=1(tkij-ˉtk)2bˆk=nki=1mkij=1(tkijt¯k)ykijnki=1mkij=1(tkijt¯k)2

and ˆak=ˉyk-ˆbkˉtkaˆk=y¯kbˆkt¯k, where ˉtk=N-1knki=1mkij=1tkijt¯k=N1knki=1mkij=1tkij and ˉyk=N-1knki=1mkij=1ykijy¯k=N1knki=1mkij=1ykij.

By Liang and Zeger (1986), as nn, nk(ˆbk-bk)N(0,vk)nk(bˆkbk)N(0,vk) in distribution. Here vk is consistently estimated by

ˆvk=nki=1{mkij=1(tkij-ˉtk)ˆɛkij}2{nki=1mkij=1(tkij-ˉtk)2}2vˆk=nki=1{mkij=1(tkijt¯k)ɛˆkij}2{nki=1mkij=1(tkijt¯k)2}2

where ˆɛkij=ykij-ˆak-ˆbktkijɛˆkij=ykijaˆkbˆktkij.

18.5.1.2 Binary Outcome Variable Case

Let pkij=μkijpkij=μkij in the binary outcome variable case. We have to solve the equation using a numerical method, such as the Newton-Raphson algorithm: at the l-th iteration,

(ˆa(l)kˆb(l)k)=(ˆa(l-1)kˆb(l-1)k)+n-1/2kA-1k(ˆa(l-1)k,ˆb(l-1)k)Uk(ˆa(l-1)k,ˆb(l-1)k),aˆ(l)kbˆ(l)k=aˆ(l1)kbˆ(l1)k+n1/2kA1k(aˆ(l1)k,bˆ(l1)k)Uk(aˆ(l1)k,bˆ(l1)k),

where

Ak(a,b)=-n-1/2kUk(a,b)(a,b)=1nknki=1mkij=1pkij(a,b)qkij(a,b)(1tkijtkijt2kij)

pkij(a,b)=g-1(a+btkij)=ea+btkij1+ea+btkij                 (5)

and qkij(a,b)=1-pkij(a,b).

By Liang and Zeger (1986), for large n, nk(ˆak-ak,ˆbk-bk)T is asymptotically normal with mean 0 and variance Vk that can be consistently estimated by

ˆVk=A-1k(ˆak,ˆbk)ˆΣkA-1k(ˆak,ˆbk)

where

ˆΣk=1nknki=1{mkij=1ˆɛkij(1tkij)}2,

ˆɛkij=ykij-pkij(ˆak,ˆbk) and c2=ccT for a vector . Let ˆvk be the (2,2)-component of ˆVk.

 

18.5.2 Sample Size Calculation

Suppose that we want to test the rate of change between two groups (that is, H0: b1 = b2) . Based on the asymptotic results from the previous section, we can reject H0: b1 = b2 in favor of H0: b1b2 when

|ˆb1-ˆb2ˆv1/n1+ˆv2/n2|>z1-α/2,                   (6)

Where Z1−α/2 is the 100(1−α/2) percentile of a standard normal distribution.

In this section, we derive a sample size formula for the two-sided α test (6) to detect H1:|b2-b1|=Δ(>0) with power 1−β. When designing a study, we usually schedule fixed visit times t1<<tmfor m repeated measurements from each subject. We often set t1 = 0 for the baseline measurement time. When the study is conducted, however, the subjects may skip some visit times due to various reasons, which results in missing values, or they may not follow the visit schedule correctly so that the observed visit times may be variable. Jung and Ahn (2003) show through simulations in a continuous outcome variable case that the sample size formula based on fixed measurement times is very accurate even when the observed measurement times are widely distributed around the scheduled times.

By simple algebra, we can derive a sample size formula to detect the specified difference|b2-b1|=Δ with power 1−β,

n=(z1-α/2+z1-β)2(v1/r1+v2/r2)Δ2,          (7)

where vk=limnˆvk. The expression of vk is slightly different between continuous and binary outcome variable cases, as shown in the following subsections.

In order to allow for missing values, let δj denote the proportion of patients with observations at tj and δj the proportion of patients with observations at both tj and tj’. Also, let . ρjj=corr(ykij,ykij)We assume that the missing pattern and correlation structure are common in two treatment groups. We discuss specific models for missing pattern and correlation structure in Section 18.5.3.

18.5.2.1 Continuous Variable Case

We assume that the continuous outcome variable has a constant variance var(ykij)=σ2over time, which is common between two treatment groups. Jung and Ahn (2003) show that is v1 = v2 = v expressed as

v=σ2(s2+c)s4,

where

s2=mj=1δj(tj-τ)2

c=jjδjjρjj(tj-τ)(tj-τ)

and

τ=mj=1δjtjmj=1δj.

Hence, (7) is simplified to

n=v(z1-α/2+z1-β)2Δ2r1r2.                    (8)

Note that we do not have to specify the true values for {(ak,bk),k=1,2}but only the difference |b2-b1|=Δin sample size calculation. The sample size is proportional to the variance of measurement error and decreases as r1 approaches 1/2.

18.5.2.2 Binary Variable Case

Let Pkj = ak + bktk denote the success probabilities under H1. Jung and Ahn (2005) show that

vk=s2k+cks4k,

where

s2k=mj=1δjpkjqkj(tj-τk)2

ck=jjδjjρjjpkjqkjpkjqkj(tj-τk)(tj-τk),

and

τk=mj=1δjpkjqkjtjmj=1δjpkjqkj.

As shown here, the sample size formula under the binary outcome variable case depends on the probabilities (Pkj, j = 1, . . . , m) under H1, so that we need to specify all regression parameters {(ak,bk),k=1,2}. Let Δ = b1b2 denote the difference in slope between two groups under H1. If pilot data exist for the control group (group 1), then we may use the estimates as the parameter values, a1 and b1. If t = 0 is the baseline, the intercepts in the two groups are set the same (that is, a1 = a2) . By setting b2 = b1 + Δ, we can specify all regression coefficients under H1.

If no pilot data are available, we may specify the binary probabilities at the baseline, p11, and at the end of follow up, p1m , for the control group. Then we obtain (a1b1) by

b1=g(p1m)-g(p11)tm-t1,               (9)

a1=g(p11)-b1t1=g(p11).

And we set a2 = a1(since p11 = p21 at the baseline) and b2 = b1 + Δ.

 

18.5.3 Modelling Missing Pattern and Correlation Structure

Calculation of (or) requires projection of the missing probabilities and the true correlation structure.

18.5.3.1 Missing Pattern

In order to specify δjj, we need to estimate the missing pattern. If missing at time tj is independent of missing at time tj for each patient, then we have δjj’ = δjj, and we call this type independent missing.

In some studies, subjects missing at a measurement time may be missing at all following measurement times, as in the labor pain study discussed in Example 18.6. This type of missing is called monotone missing. In this case, we have δjj’ = δj for (j<j(δ1δm)). In monotone missing cases, one may want to specify the proportion of patients who will contribute exactly the first j observations, say £j. Then, noting that ηj=Δj-Δj+1for j = 1,...,m−1 and μm = δm , we can obtain δj recursively starting from δm. Note also that jηj=Δ1 , which equals 1 if all patients have measurements at the first measurement time t1.

In summary, we consider two missing patterns as candidates to the true one: (a) independent missing, where δjj=δjδj , and (b) monotone missing, where δjj=δj for j<j (δ1δm ).

18.5.3.2 Correlation Structure

Now, we specify the true correlation structure ρjj . A reasonable model for ɛij=yij-g-1(a+btij) may be

ɛij=ui+eij,

where ui is a subject-specific error term with variance σ2u and eij is a serially correlated within-subject error term with variance σ2e and correlation coefficients ˜ρjj. Assuming that ui and eij are independent, we have

cov(ɛij,ɛij)=σ2a+σ2e˜ρjj.

Often, the variation between subjects (σ2u) is much larger than that within subjects (σ2e). In this case, we have

cov(ɛij,ɛij)σ2u

and an exchangeable correlation structure, ρjj=ρ for jj , may be a reasonable approximation to the true one.

On the other hand, if the variation within the subject (σ2e ) dominates over the variation between subjects (σ2u ), then we will have

cov(ɛij,ɛij)σ2e˜ρjj,

so that a serial correlation structure may be a reasonable approximation to the true one. One of the most popular serial correlation structures, especially when measurement times are not equidistant, is a continuous autocorrelation model with order 1, AR(1), for which ρjj=ρ|tj-tj| .

We consider two correlation structures as candidate approximations to the true one: (i) exchangeable, where ρjj=ρ , and (ii) AR(1), where ρjj=ρ|tj-tj| .

18.5.3.3 Examples

For sample size calculation, the following parameters need to be specified commonly in both continuous and discrete outcome variable cases:

• Type I and II errors α and β, respectively.

• The size of difference to detect, |b2-b1|=Δ .

• Allocation proportion for group k, rk.

• Correlation structure and the associated correlation parameter ρ, (that is, (i) ρjj=ρ for exchangeable or (ii) ρjj=ρ|tj-tj| for AR(1)).

• Proportion of patients with an observation at tj, δj.

• Missing pattern: (a) independent missing (δjj=δjδj ) or (b) monotone missing (δjj=δj for j<j ) .

In addition, we need to specify σ2=var(ɛij) in the continuous outcome variable case and the regression coefficients {(ak,bk),k=1,2} under H1 in the binary outcome variable case.

We demonstrate our sample size formula with real longitudinal studies.

Example 18.6 Continuous Outcome Case

In a study on labor pain (Davis, 1991), 83 women in labor were assigned to either a pain medication group (43 women) or a placebo group (40 women). At 30-minute intervals, the self-reported amount of pain was marked on a 100mm line, where 0 = no pain and 100 = extreme pain. The maximum number of measurements for each woman was , but there were numerous missing values at later measurement times with monotone missing pattern. A simple approach to such a study objective might be to estimate and compare the slopes of pain scores over time in the two treatment groups. In this study, the outcome variable is continuous. Suppose that we want to design a new study on labor pain based on the data reported by Davis (1991). As in the original study, we assume monotone missing. In this study, the measurement times were equispaced, so that we set tj = j − 1 (j = 1,…6) for convenience. From the data, we obtained σ2 = 815.84. Suppose we want to detect a difference of σ in mean pain score between two groups at t6. So, we project Δ = σ/(t6t1 = 5.71 in a new study. We consider a balanced design (that is, r1 = r2 = 1/2) . Also, from the data, the proportion of observed measurements are

(δ1,δ2,δ3,δ4,δ5,δ6)=(1,.90,.78,.67,.54,.41).

From these results τ = 2.02, we obtain and s2 = 11.42.

Suppose that we want to detect a difference of Δ = 5.71 with 80% of power (z1−β = 0.84) using a two-sided α = .05 (z1−α/2 = 1.96) test. Under an exchangeable correlation structure, we obtain ρ = .64 and c = −3.13 from the data, so that we have v = 815.84×(11.42−3.13)/11.422 = 51.84. Hence, from (8), the required sample size is calculated as

n=[51.84×(1.96+0.84)25.712]+1=50.

where [x] is the largest integer not exceeding x.

Under AR(1), we obtain from the data ρ = .80 and c = 2.31, so that the required sample size for detecting Δ = 5.71 with 80% of power using a two-sided α = .05 test is given as n = 83. The sample size under AR(1) is larger than that under an exchangeable correlation, by about 66%, in this example.

Program 18.6 SAS Code for Two-Group Comparision of Repeated Continuous Measurement

%macro SS_RepeatedContinuousMeasurement(
               missingPattern = , /* 1: independent missing 2: monotone missing     */
               corrStructure  = , /* 1: compound symetric , 2: AR(1)                */
               rho            = , /* associated correlation parameter               */
               m              = , /* number of measurement time points              */
               sigma_sq       = , /* variance                                       */
               inR            = , /* group allocation proportion(r1, r2)            */
               inDelta        = , /* proportion of observed measurements            */
               alpha          = , /* alpha                                          */
               power          = , /* power                                          */
               sides          = , /* 1: one-sided test  2: two-sided test           */
               print    = 0       /* 0:default 1: detail                            */
                               );
proc iml;
  %let K = 2; r = &inR; delta = &inDelta; t = J(&m,1,0);
  do j=1 to &m; t[j] = j-1; end;
  %if &inDelta eq %then %do; delta=J(&m,1,0);do j=1 to &m; delta[j]=1-(j-1)/20; end; %end;
  d = sqrt(&sigma_sq)/(t[&m]-t[1]);
  start g(p); gp = log(p/(1-p)); return (gp); finish g;
  start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
  start rho(i,j,r,c);
    if c=1 then do; /* CS */ if i=j then rho_ij =1;else rho_ij = r; end;
    else do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
    return (rho_ij);
  finish rho;
  tau_num =0;  tau_denum = 0;
  do j=1 to &m; tau_num = tau_num+delta[j]*t[j]; tau_denum = tau_denum+delta[j]; end;
  tau= tau_num/tau_denum; s_sq=0;
  do j=1 to &m; s_sq = s_sq+delta[j]*((t[j]-tau)**2); end;
  c=0;
  do i=1 to &m;
     do j=1 to &m;
           if i ^= j then do;
              if &missingPattern = 1 then do; delta_ij = delta[i]*delta[j]; end;
          else do; if j > i then max_ij=j; else max_ij=i; delta_ij=delta[max_ij]; end;
          c = c+delta_ij*rho(i,j,&rho,&corrStructure)*(t[i]-tau)*(t[j]-tau);
                end;
      end;
   end;
   v = &sigma_sq*(s_sq+c)/(s_sq**2);
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1); z_p2=&power;  z_beta  = probit(z_p2);
   n = int((v*(z_alpha+z_beta)**2)/(d**2*r[1]*r[2]))+1;
   print 'Sample Size Calculation for a Two-Group Comparision';
   print ' of Repeated Continous Measurements';
   alpha  =&alpha; power=&power; rho=&rho; sides=&sides;
   sigma_sq=&sigma_sq;
   print alpha power rho sides;
   if &missingPattern = 1 then do;print ' Missing Pattern: Independent '; end;
   else if &missingPattern = 2 then do; print ' Missing Pattern: Monotone '; end;
   if &corrStructure=1 then do; print ' Correlation Structure : Compound Symetric ';end;
   else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) '; end;
   %if &print = 1 %then %do; print delta; print tau s_sq c v ; %end;
   print d sigma_sq;
   print n;
   quit;
run;
%mend SS_RepeatedContinuousMeasurement;

/*--Run the macro for Example 18.6 when Correlation Structure = Compound Symetric --*/

%SS_RepeatedContinuousMeasurement(
       missingPattern = 2,              /* monotone missing                      */
       corrStructure  = 1,              /* compound symetric                     */
       rho            = 0.64,           /* associated correlation parameter      */
       m              = 6,              /* number of measurement time points     */
       sigma_sq       = 815.84,         /* variance                              */
       inR          = %str({0.5, 0.5}), /* group allocation proportion(r1, r2)   */
       inDelta      = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
                                        /* proportion of observed measurements   */
       alpha        = 0.05,             /* alpha                                 */
       power        = 0.8,              /* power                                 */
       sides         = 2,               /* two-sided test                        */
       print         = 1
        );

/*--- Run the macro for Example 18.6 when Correlation Structure = AR(1) ---------*/

%SS_RepeatedContinuousMeasurement(
     missingPattern  = 2,              /*  monotone missing                      */
     corrStructure  = 2,               /* AR(1)                                  */
     rho             = 0.8,            /*  associated correlation parameter      */
     m              = 6,               /* number of measurement time points      */
     sigma_sq        = 815.84,         /* variance                               */
     inR            = %str({0.5, 0.5}),/* group allocation proportion(r1, r2)    */
     inDelta        = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
                                       /* proportion of observed measurements    */
     alpha          = 0.05,            /* alpha                                  */
     power          = 0.8,             /* power                                  */
     sides           = 2,              /* two-sided test                         */
     print = 1
        );

Output from Program 18.6

/*------------------ When Correlation Structure = Compound Symetric --------*/

 

Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements

 

ALPHA POWER RHO SIDES
0.05 0.8 0.64 2

Missing Pattern: Monotone
Correlation Structure : Compound Symetric

 

TAU S_SQ C V
2.0186047 11.418512 -3.133345 51.842656

 

D SIGMA_SQ
5.7125826 815.84

N
50

/*--------------------- When Correlation Structure = AR(1) -----------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements

 

ALPHA POWER RHO SIDES
0.05 0.8 0.8 2

Missing Pattern: Monotone
Correlation Structure : AR(1)

 

TAU S_SQ C V
2.0186047 11.418512 2.3055968 85.87567

 

D SIGMA_SQ
5.7125826 815.84

N
  83

Example 18.7 Binary Outcome Case

The Genetics vs. Environment In Scleroderma Outcome Study (GENISOS) is an observational study designed as a collaboration of the University of Texas-Houston Health Science Center with the University of Texas Medical Branch at Galveston and the University of Texas-San Antonio Health Science Center (Reveille et al., 2001). Scleroderma, or systemic sclerosis, is a multisystem disease of unknown etiology characterized by cutaneous and visceral fibrosis, small blood vessel damage, and autoimmune features (Medsger, 1997). The study subjects are regularly followed to check for the occurrence of pulmonary fibrosis. In this case, the outcome is a binary variable.

Suppose that we want to develop a study to examine the effect of a new drug in preventing the occurrence of pulmonary fibrosis in subjects with scleroderma compared to no intervention. The parameter values specified here for sample size calculation are approximated by the estimates from the current data set of GENISOS.

We want to estimate the sample size for the new study using α = .05 and 1−β = .8 . As in GENISOS, presence or absence of pulmonary fibrosis is assessed at baseline and at months 6, 12, 18, 24, and 30. Because the measurement times are equidistant, we set tj = j − 1 j ( j = 1,…,6 ) for the m = 6 time points. The within-group correlation structure of the repeated measurements conforms to AR(1) with the adjacent correlation equal to ρ = .8 (that is, ρjj=.8|j-j| ). We consider assigning an equal number of scleroderma patients in each of two groups (that is, r1 = r2 =1/2).

Approximately 75% of scleroderma patients do not have pulmonary fibrosis at the baseline in the ongoing GENISOS. We project that the proportion of subjects without pulmonary fibrosis is 75% at baseline (that is, p1,1 = .75) and 50% at 30 months (that is, p1,j = .50) in a placebo group. We assume that a new therapy will prevent or delay further occurrence of pulmonary fibrosis. That is, the proportion of subjects without pulmonary fibrosis will remain 75% during the 30-month study in a new therapy group (in other words, p2,1 = p2,6 = .75 ). From these values and (9), we obtain

b1=g(.5)-g(.75)5-0=-0.220,

and a1 = g(.75) = 1.099. Similarly, we obtain (a2, b2)=(1.099, 0), so that Δ = 0 − (−0.220) = 0.220. By (5), the probabilities of no pulmonary fibrosis at the 6 time points are obtained as (.750,.707,.659,.608,.555,.500) for the placebo group and (.750,.750,.750,.750,.750,.750) for the treatment group.

The proportions of observed measurements are expected to be

(δ1,δ2,δ3,δ4,δ5,δ6)=(1,.95,.90,.85,.80,.75).

Suppose that we expect independent missing. Then, we obtain δjj from the specified δj values using δjj=δjδj . Now we have all the parameter values required, and we obtain v1 = 0.305 and v1 = 0.353. Finally, from (7), we obtain

n=[(1.96+0.84)2(0.305/0.5+0.353/0.5)0.2202]+1=215.

If we assume monotone missing ( δjj=δjj ), we obtain v1 = 0.324 , , v2 = 0.308 and n = 229 .

Program 18.7 SAS Code for Two-Group Comparision of Repeated Binary Measurement

%macro SS_RepeatedBinaryMeasurement(
       missingPattern =   , /* 1: independent missing 2: monotone missing       */
       corrStructure =    , /* 1: compound symetric , 2: AR(1)                  */
       rho           =    , /*  associated correlation parameter                */
       m             =    , /* number of measurement time points                */
       inP1          =    , /* proportion of subjects for control group         */
       inP2          =    , /* proportion of subjects                           */
       inR           =    , /* group allocation proportion(r1, r2)              */
       inDelta =          , /* proportion of observed measurements              */
       alpha   =          , /* alpha                                            */
       power   =          , /* power                                            */
       sides    =           /* 1: one-sided test  2: two-sided test             */
     );
  proc iml;
   start g(p); gp = log(p/(1-p)); return (gp); finish g;
   start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
   start rho(i,j,r,c);
    if c=1 then do; /* CS */ if i=j then rho_ij =1; else rho_ij = r; end;
    else if c = 2 then do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
    return (rho_ij);
   finish rho;
   %let K = 2;
   P1=&inP1;  P2=&inP2; r =&inR; a=J(&K,1,0);  b=J(&K,1,0); t=J(&m,1,0);
   tau = J(&K,1,0);  s_sq = J(&K,1,0); v = J(&K,1,0); c_sq = J(&K,1,0);
   do j=1 to &m;  t[j] = j-1; end;
   %if &inDelta eq %then %do;
       delta = J(&m,1,0);
       do j=1 to &m; delta[j] = 1-(j-1)/20; end;
   %end;
   b[1] = (g(P1[&m])-g(P1[1]))/(t[&m]-t[1]); a[1] = g(P1[1])-B[1]*t[1];
   b[2] = (g(P2[&m])-g(P2[1]))/(t[&m]-t[1]); a[2] = g(P2[1])-B[1]*t[1];
   d = b[2]-b[1];
   do j=1 to &m;
      T[j] = j-1; P1[j]= prob(a[1], b[1], t[j]); P2[j]= prob(a[2], b[2], t[j]);
   end;
   tau1_num =0; tau1_denum = 0; tau2_num =0; tau2_denum = 0;
   do j=1 to &m;
     tau1_num = tau1_num+delta[j]*P1[j]*(1-P1[j])*t[j];
     tau1_denum = tau1_denum+delta[j]*P1[j]*(1-P1[j]);

     tau2_num = tau2_num+delta[j]*P2[j]*(1-P2[j])*t[j];
     tau2_denum = tau2_denum+delta[j]*P2[j]*(1-P2[j]);
   end;
   tau[1]= tau1_num/tau1_denum; tau[2]= tau2_num/tau2_denum;
   s1=0;s2=0;
   do j=1 to &m;
      s1 = s1+delta[j]*P1[j]*(1-P1[j])*((t[j]-tau[1])**2);
      s2 = s2+delta[j]*P2[j]*(1-P2[j])*((t[j]-tau[2])**2);
   end;
   s_sq[1]=s1; s_sq[2]=s2; c1=0; c2=0; c12=0;
   do i=1 to &m;
      do j=1 to &m;
           if i ^= j then do;
              if &missingPattern = 1 then do;
             delta_ij = delta[i]*delta[j];
              end;else do;
                 if j > i then max_ij = j; else max_ij = i; delta_ij = delta[max_ij];
          end;
          c1=c1+delta_ij*rho(i,j,&rho,&corrStructure)*
             sqrt(P1[i]*(1-P1[i])*P1[j]*(1-P1[j]))*((t[i]-tau[1])*(t[j]-tau[1]));
          c2=c2+delta_ij*rho(i,j,&rho,&corrStructure)*
             sqrt(P2[i]*(1-P2[i])*P2[j]*(1-P2[j]))*((t[i]-tau[2])*(t[j]-tau[2]));
         end;
      end;
   end;
   c_sq[1]=c1; c_sq[2]=c2;
   do k =1 to &K; v[k] = (s_sq[k]+c_sq[k])/(s_sq[k]**2); end;
   z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
   z_p2=&power; z_beta  = probit(z_p2);
   n = int((z_alpha+z_beta)**2*(v[1]/r[1]+v[2]/r[2])/d**2)+1;

   print 'Sample Size Calculation for a Two-Group Comparision';
   print ' of Repeated Binary Measurements';
   alpha  =&alpha; power=&power; rho=&rho; sides=&sides;
   print alpha power rho sides;
   if &missingPattern = 1 then do; print ' Missing Pattern: Independent '; end;
   else if &missingPattern = 2 then do;print ' Missing Pattern: Monotone '; end;
   if &corrStructure = 1 then do; print ' Correlation Structure : Compound Symetric ';
   end; else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) ';
   end;
   print P1 P2 delta v ; print d; print n;
   quit;
run;
%mend  SS_RepeatedBinaryMeasurement;

/*--------------- Run the macro for Example 18.7 ------------------------------*/

%SS_RepeatedBinaryMeasurement(
     missingPattern = 1,          /* 1: independent missing 2: monotone missing     */
     corrStructure  = 2,          /* 1: compound symetric , 2: AR(1)                */
     rho     = 0.8,               /*  associated correlation parameter              */
     m       = 6,                 /* number of measurement time points              */
    inP1    = %str({0.75, 0, 0, 0, 0, 0.5}),
                                  /* proportion of subjects for control group       */
    inP2    = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}), 
                                  /* proportion of subjects                         */
     inR     = %str({0.5, 0.5}),  /* group allocation proportion(r1, r2)            */
     inDelta = ,      /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
     alpha   = 0.05,              /* alpha                                          */
     power   = 0.8,               /* power                                          */
     sides   = 2                  /* 1: one-sided test  2: two-sided test           */
     );

%SS_RepeatedBinaryMeasurement(
    missingPattern = 2,          /*  monotone missing                               */
    corrStructure = 2,           /* AR(1)                                           */
     rho     = 0.8,                 /* associated correlation parameter             */
     m       = 6,                   /* number of measurement time points            */
    inP1   = %str({0.75, 0, 0, 0, 0, 0.5}),   

                                    /* proportion of subjects for control group     */
    inP2    = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}), 
                                 /* proportion of subjects                          */
     inR     = %str({0.5, 0.5}),    /* group allocation proportion(r1, r2)          */
     inDelta = ,      /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
     alpha   = 0.05,                /* alpha                                        */
     power   = 0.8,                 /* power                                        */
     sides   = 2                    /* 1: one-sided test  2: two-sided test         */
     );

 

Output from Program 18.7

/*--------------When Missing Pattern Is Independent Missing ----------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements

 

ALPHA POWER RHO SIDES
0.05 0.8 0.8 2

Missing Pattern: Independent
Correlation Structure : AR(1)

 

P1 P2 DELTA V
0.75 0.75 1 0.3048798
0.7065921 0.75 0.95 0.3534175
0.6590733 0.75 0.9  
0.6081268 0.75 0.85  
0.5547107 0.75 0.8  
0.5 0.75 0.75  

D
0.2197225

N
215

 

/*-------------- When Missing Pattern Is Monotone Missing ------------------*/

Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements

 

ALPHA POWER RHO SIDES
0.05 0.8 0.8 2

Missing Pattern: Monotone
Correlation Structure : AR(1)

 

P1 P2 DELTA V
0.75 0.75 1 0.3236844
0.7065921 0.75 0.95 0.3804059
0.6590733 0.75 0.9  
0.6081268 0.75 0.85  
0.5547107 0.75 0.8  
0.5 0.75 0.75  

D
0.2197225

N
229

 

18.6 Discussion

In designing an observational study, we should choose a statistical testing method that will be used when analyzing the final data. For an accurate sample size calculation, the sample size formula should be directly derived from the power function of the chosen testing method.

There are two types of parameters included in sample size formulas: primary and nuisance parameters. The primary parameters are those shown in the statistical hypotheses (for example, means, binomial proportions, and hazard rates), and the nuisance parameters (for example, prevalence rates, accrual period in survival data, correlation coefficients, and missing type and probabilities in longitudinal data) are those of no or secondary interest. Because both types of parameters determine the final sample size, it is important to specify the parameter values as accurately as possible. If a database exists, we usually estimate the values of nuisance parameters and the primary parameter value of the control group from the database. Although rk may be chosen by the prevalence of a disease, we may decide to use different allocation proportions from the prevalence rates in the natural population depending on the relative cost to accrue subjects in two groups. The primary parameter value for the case group is usually chosen based on the clinical significance of the associated intervention on the outcome variable, which is measured in terms of the difference or ratio of the primary parameter values between the two groups. When there is uncertainty in some of these parameters, we may conduct a sensitivity analysis on the power to demonstrate that the power would be adequate under a range of scenarios.

 

Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under Hα

Let ˜Xki=Xki-θk. Then ˜X11,…,˜X1n1,˜X21,…,˜X2n2 are IID with cumulative density function (CDF) F(x). Noting that

W=1n1n2n1i=1n2j=1I(˜X1i>˜X2j-Δ),

we have W=-ˆF2(x+Δ)dˆF1(x), where ˆFk(x)=n-1knki=1I(˜Xkix) is the empirical CDF of ˜Xk1,…,˜Xknk that uniformly converges to F(x) as nk . Let

νa=P(˜X1i>˜X2j-Δ)=-F(x+Δ)dF(x) . For a large n, W-νa is expressed as

-ˆF2(x+Δ)dˆF1(x)--F(x+Δ)dF(x)

=-F(x+Δ)d{ˆF1(x)-F(x)}+-{ˆF2(x+Δ)-F(x+Δ)}dF(x)+op(n-1),

where op(n-1)=-{ˆF2(x+Δ)-F(x+δ)}d{ˆF1(x)-F(x)} is negligible for a large n.

Because

-F(x+Δ)ˆF1(x)=1n1n1i=1F(˜X1i+Δ)

and

-F(x+Δ)ˆF1(x)=1n1n1i=1F(˜X1i+Δ)

we have var(W)=σ21/n1+σ22/n2, where

σ21=var{F(˜X1i+Δ)}

σ22=var{ˉF(˜X2i-Δ)}=var{F(˜X2i-Δ)}.

 

References

Aalen, O. 1978. “Nonparametric inference for a family of counting processes.” Annals of Statistics 6: 701–726.

Carroll, R. J., and D. Ruppert. 1988. Transformation and Weighting in Regression. New York: Chapman and Hall/CRC.

Davis, C. S. 1991. “Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials.” Statistics in Medicine 10: 1959–1980.

Fleming, T. R., and D. P. Harrington. 1991. Counting Processes and Survival Analysis. New York: John Wiley & Sons, Inc.

Gart, J. J. 1985. “Approximate tests and interval estimation of the common relative risk in the combination of 2 × 2 tables.” Biometrika >72: 673–677.

Jung, S. H., and C. Ahn. 2003. “Sample size estimation for GEE method for comparing slopes in repeated measurements data.” Statistics in Medicine >22: 1305–1315.

Jung, S. H., and C. W. Ahn. 2005. “Sample size for a two-group comparison of repeated binary measurements using GEE.” Statistics in Medicine 24(17): 2583–2596.

Jung, S. H., S. C. Chow, and E. Chi. 2007. “A note on sample size calculation based on propensity analysis in non-randomized trials.” Journal of Biopharmaceutical Statistics >17: 35–41.

Lakatos, E. 1988. “Sample sizes based on the log-rank statistic in complex clinical trials.” Biometrics 44: 229–241.

Liang, K. Y., and S. L. Zeger. 1986. “Longitudinal data analysis using generalized linear models.” Biometrika >73: 13–22.

Mann, H. B., and D. R. Whitney. 1947. “On a test of whether one of two random variables is stochastically larger than the other.” The Annals of Mathematical Statistics 18: 50-60.

Medsger, Jr., T. A. 1997. “Systemic sclerosis (scleroderma): clinical aspects. In  Arthritis and allied conditions: a textbook of rheumatology. 13th ed. Edited by W. J. Koopman. Baltimore, MD: Williams &  Wilkins, pp 1433–1464.

Nelson, W. 1969. “Hazard plotting for incomplete failure data.” Journal of Quality Technology 1(1): 27–52.

Peto, R., and J. Peto. 1972. “Asymptotically efficient rank invariant test procedures (with discussion).” Journal of the Royal Statistical Society, Series A: General 135: 185–206.

Reveille, J. D., M. Fischbach, T. McNearney, et al. 2001. “Systemic sclerosis in 3 US ethnic groups: a comparison of clinical, sociodemographic, serologic, and  immunogenetic determinants.” Seminars in Arthritis and Rheumatism >30(5): 332–346.

Rosenbaum, P. R., and D. B. Rubin. 1983. “The central role of the propensity score in observational studies for causal effects.” Biometrika >70: 41–55.

Rosenbaum, P. R., and D. B. Rubin. 1984. “Reducing bias in observational studies using subclassification on the propensity score.” Journal of the American Statistical Association 79: 516–524.

Rubin, D. B. 1976. “Inference and missing data.” Biometrika >63: 581–590.

Rubinstein, L. V., M. H. Gail, and T. J. Santner. 1981. “Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation.” Journal of Chronic Diseases >34(9/10): 469–479.

SAS Institute Inc. 2009. SAS/STAT9.2 User’s Guide, Second Edition. Cary, NC: SAS Institute Inc.

Yue, L. Q. 2007. “Statistical and regulatory issues with the application of propensity score analysis to non-randomized medical device clinical studies.” Journal of Biopharmaceutical Statistics 17: 1–13.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset