Sample Size Calculation for Observational Studies
18.4 Two-Sample Log-Rank Test for Survival Data
18.5 Two-Sample Longitudinal Data
Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under Ha
For a number of reasons, observational studies are currently being used to provide evidence to support medical decisions. Because these studies do not carry the same level of credibility conferred to randomized trials, appropriate attention to statistical issues is paramount. In particular, considerations of event rate, sample size, and power frequently occur. This chapter provides mechanisms for dealing with these considerations under a variety of scenarios. Additionally, the methods presented here are applicable more generally to clinical trials as well.
The increasing availability of computerized clinical patient data has stimulated greater interest in research on observational data. Individual medical practices and hospitals are now attempting to draw inferences on the basis of data gathered on their own patients. Additionally, the proliferation of large registry databases that harvest clinical data from hundreds of sites has created opportunities to study questions regarding rare diseases and rare events. However, even studies using these large databases, when reduced to the appropriate study populations and endpoints, encounter issues of sample size and power.
Observational studies can be designed to address a variety of objectives. If the primary analysis of the study is simply to estimate some parameter, then one can use standard sample size calculations for requiring a confidence interval of at most a certain width. If the objective involves the comparison of cohorts, then in many cases the statistical analysis will involve some form of selection bias adjustment due to the non-randomized nature of observational research. Chapters 1 through 11 cover the issues of selection bias adjustment and standard analytical methods for comparing cohorts in these settings, including propensity score regression, stratification, matching, doubly robust adjustment, instrumental variables, local control, and various longitudinal analyses. Regarding the computation of sample sizes in study planning in these scenarios, such complex designs and analyses may require custom sample size and power estimate calculations through simulations. In some cases, the primary hypothesis may be simplified for the purpose of estimating power and sample size, and generalized SAS macros can perform the needed calculations.
In this chapter, we present a series of SAS macros for computing sample sizes for studies with various types of outcomes and comparisons (all for comparisons between two groups):
• t-test (Section 18.2.1)
• Wilcoxon Rank Sum test (Section 18.2.2)
• Two-Sample Tests on Binary outcomes (Section 18.3.1)
• Weighted Mantel-Haenszel test (Section 18.3.2)
• Log-Rank test for survival data (Section 18.4)
• Longitudinal Data—Continuous outcomes (Section 18.5.3.3)
• Longitudinal Data—Binary outcomes (Section 18.5.3.3)
The sample size macro for the Weighted Mantel-Haenszel test (provided in Section 18.3.1) is directly applicable for observational studies for which the analysis is based on the commonly utilized propensity score stratification approach. This propensity score stratification analysis methodology is described in detail in Chapter 2. In brief, propensity score stratification involves estimating a propensity score for each study participant, stratifying by this score, estimating the difference in outcomes between the two cohorts within each of the strata, and then combining the estimated cohort differences over strata.
Other sections in this chapter present sample size macros based on analysis methods without specific selection bias adjustments. However, these can be very useful for observational study planning when simple cohort comparisons are planned (for example, such as when causal inference is not the objective) or as a very easy-to-use initial calculation when other methods are being utilized such as propensity score matching. In this example, one would supplement the sample sizes computed in these macros with an estimate of the number of patients excluded from the analysis due to a lack of propensity score matched patients in the other cohort. In addition, most of the sample size calculation methods in this chapter can be used for randomized clinical trials as well.
In this section, we consider two two-sample tests for continuous variables: t-test and Wilcoxon rank test.
Suppose that, for group k(=1,2), Xk1,..., Xk,nk are independent and identically distributed (IID) normal random variables with mean μk and variance σ2. We want to test H0 : μ1 = μ2 vs. Ha:μ1≠μ2
s2p=∑n1i=1(X1i-ˉX1)2+∑n2i=1(X2i-ˉX2)2n1+n2-2
denote the pooled sample variance. Then, under H0,
T=ˉX1-ˉX2sp√n-11+n-12
follows the t-distribution with n1 + n2 − 2 degrees of freedom. Hence, given type I error probability α, we reject H0 if |T|>tn1+n2-2,1-α/2
If n1 and n2 are large, then we do not require the normal distribution assumption, and the critical value from the t-test, tn1+n2-2,1-α/2
We consider a specific alternative hypothesis Ha : |μ1 − μ2|= ∆σ. Note that ∆ = σ-1 (μ1 − μ2) denotes a standardized effect size under the alternative hypothesis. Without loss of generality, we assume that μ1 > μ2 under Ha. Then, under Ha,
T≈ˉX1-ˉX2-σΔσ√n-11+n-12+Δ√n-11+n-12
where Z : N(0, 1), rk = nk/n, and n = n1 + n2. For a given n, the power is calculated as
1-β=ˉϕ(z1-α/2-Δ√nr1r2),
where ˉϕ(z)=1-ϕ(z)
n=(z1-α/2+z1-β)2r1r2Δ2,
or nk = n х rk for group k. For one-sided testing, replace α/2 with α. This sample size is based on a large sample approximation to a t-distribution, so that this formula underestimates the sample size for small sample sizes. If the final sample size by the formula based on normal approximation is smaller than 30, we may consider increasing it by about 10%.
In summary, the sample size calculation for a two-sample t-test is conducted as follows:
• In addition to type I error probability α and power 1 − β, specify
(a) Δ=σ-1|μ1-μ2|
(b) the prevalence of group k (r1+r2=1)
• Calculate the sample size
n=(z1-α/2+z1-β)2r1r2Δ2.
Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α= 0.05 and power 1 − β = 0.9. Then z1-α/2=1.96
proc power;
twosamplemeans
nfractional
meandiff = 0.5 /* mean difference */
stddev = 1 /* sigma */
groupweights = (0.5 0.5) /* r1 r2 */
sides = 2 /* 1: one sided 2: 2: two-sided */
power =0.9 /* power */
alpha =0.05 /* alpha */
ntotal=.; /* Sample Size */
run;
The POWER Procedure
Two-sample t Test for Mean Difference
Fixed Scenario Elements
Distribution | Normal |
Method | Exact |
Number of Sides | 2 |
Alpha | 0.05 |
Mean Difference | 0.5 |
Standard Deviation | 1 |
Group 1 Weight | 0.5 |
Group 2 Weight | 0.5 |
Nominal Power | 0.9 |
Null Difference | 0 |
Computed Ceiling N Total
Fractional N Total |
Actual Power |
Ceiling N Total |
170.062568 | 0.902 | 171 |
It is known that -tests are sensitive to outliers. The Wilcoxon rank sum test (WRST) has been widely used as a robust test. We assume that for group k(=1,2)
For large n(=n1 + n2), we reject if the absolute value of
T=W-ν0σ0
is larger than z1−α/2, where v0 = 1/2 and σ20=(n+1)/(12n1n2)
Let f(x)=∂F/∂x
νa=∫∞-∞F(x+Δ)f(x)dx
and variance
σ2a=σ21n1+σ22n2,
where
σ21=∫∞-∞F2(x+Δ)f(x)dx-{∫∞-∞F(x+Δ)f(x)dx}2,
and
σ22=∫∞-∞F2(x-Δ)f(x)dx-{∫∞-∞F(x-Δ)f(x)dx}2.
Then,
T=Zσaσ0+νa-ν0σ0,
where Z=(W-νa)/σa
1-β= P (|T|>z1-α/2|Ha)=P(Zσaσ0+|νa-ν0|σ0>z1-α/2).
=ˉϕ(σ0σa(z1-α/2-|νa-ν0|σ0)),
where ˉϕ(z)= P (Z>z)
n=112r1r2{z1-α/2+z1-β√12(r2σ21+r1σ22)νa-1/2}2.
Although the testing does not require the specification of the distribution function F(x), a sample size calculation does. The alternative hypothesis is specified by two distributions with a location shift. The assumption of a location shift model is not required in order to carry out the rank sum test, but it is needed if we want to make inferences about medians or means. It is, of course, needed in the sample size calculation as presented here.
While a t-test requires a finite variance of the observations, WRST does not. For example, if has a Cauchy distribution, we can use WRST, but not a t-test.
A sample size calculation for WRST assuming normal distributions can be described as follows:
• In addition to type I error probability α and power 1 − β, specify
(a) Δ=σ-1(μ1-μ2)
(b) rk = the prevalence of group k (r1+r2=1)
• Calculate
νa=∫∞-∞ϕ(x+Δ)ϕ(x)dx
σ21=∫∞-∞ϕ2(x+Δ)ϕ(x)dx-{∫∞-∞ϕ(x+Δ)ϕ(x)dx}2
and
σ22=∫∞-∞ϕ2(x-Δ)ϕ(x)dx-{∫∞-∞ϕ(x-Δ)ϕ(x)dx}2,
where ϕ(x)
• Calculate the sample size
n=112r1r2{z1-α/2+z1-β√12(r2σ21+r1σ22)νa-1/2}2.
Suppose that we want to detect a difference of 0.5σ in the population means with two-sided α = 0.05 and power 1 − β = 0.9. Then z1-α/2=1.96
%macro WilcoxonRankSumTest( delta=, /* Standardized effect size */
r1 = , /* prevalence of group 1 */
r2 = , /* prevalence of group 2 */
sides=, /* number of test sides: 1 or 2 */
alpha=, /* alpha */
power= /* power */
);
proc iml;
start CDFPDFDELTA(t);
m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start CDF2PDFDELTA(t);
m=t+δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start CDFPDF_DELTA(t);
m=t-δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start CDF2PDF_DELTA(t);
m=t-δ c=cdf('normal',m,0,1)**2; p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
start s1(t);
m=t+δ c=cdf('normal',m,0,1); p=pdf('normal',t,0,1); v=c*p;
return(v);
finish;
interval = .M || .P;
call quad(nu1,"CDFPDFDELTA", interval);
call quad(s1, "CDF2PDFDELTA", interval);
call quad(nu2,"CDFPDF_DELTA", interval);
call quad(s2, "CDF2PDF_DELTA", interval);
sigma1 = s1-nu1**2; sigma2 = s2-nu2**2;
p1=1-&alpha/&sides; z_alpha = probit(p1);
p2=&power; z_beta = probit(p2);
n =
(1/(12*&r1*&r2))*((z_alpha+z_beta*sqrt(12*(&r2*sigma1+&r1*sigma2)))/(nu1-
0.5))**2;
delta=δr1 =&r1; r2=&r2;sides =&sides;alpha=αpower=&power;
print 'Sample Size';
print 'Wilcoxon Rank Sum Test for Mean Difference';
print delta r1 r2 sides alpha power;
print n;
quit;
run;
%mend WilcoxonRankSumTest;
/*---------------------- Run the macro for Example 2. -------------------*/
%WilcoxonRankSumTest(delta=0.5, r1 = 0.5, r2 = 0.5, sides=2, alpha=0.05, power=0.9);
Sample Size
Wilcoxon Rank Sum Test for Mean Difference
delta | r1 | r2 | sides | alpha | power |
0.5 | 0.5 | 0.5 | 2 | 0.05 | 0.9 |
n
176.41709
In this section, we investigate two-sample tests for binary outcome variables with or without stratification. We use a weighted Mantel-Haenszel test for stratified analysis to adjust for covariates that are unbalanced between two groups.
Suppose that, for group k(=1, 2), Xk denotes the number of responders from nk independent subjects. If group k has a response probability pk, then Xk is a binomial distribution with nk independent trials and response probability pk. We want to test H0:p1=p2
T=ˆp1-ˆp2√ˆpˆq(n-11+n-12)
follows the standard normal distribution under H0, where ˆq=1-ˆp
Let p1, p2 denote the response probability under a specific alternative hypothesis for sample size calculation, and Δ=p1-p2
T≈ˆp1-ˆp2-Δ√ˉpˉq(n-11+n-12)+Δ√ˉpˉq(n-11+n-12)
=Z+Δ√nr1r2ˉpˉq,
where Z : Z(0, 1) and ˉq=1-ˉp
1-β=ˉϕ(z1-α/2-Δ√nr1r2),
By solving (2) with respect to n, we obtain the required total sample size as
n=(z1-α/2+z1-β)2r1r2Δ2,
In summary, the sample size calculation for two-sample binomial proportions is conducted as follows:
• In addition to type I error probability α and power 1 − β, specify
(a) p1, p2 = binomial proportions under Ha
(b) rk the prevalence of group k (r1 + r2 = 1)
• Calculate the sample size
n=(z1-α/2+z1-β)2ˉpˉqr1r2(p1-p2)2,
where ˉp=r1p1+r2p2
Suppose that we want to detect a difference of p1 = 0.4 and p2 = 0.5 with two-sided α = 0.05 and power 1 − β = 0.9. Assuming an equal proportion of two groups in the population (that is, r1 = r2 = 0.5), we obtain the required sample size n = 10.38.
proc power;
twosamplefreq test=pchi
groupproportions =(0.4 0.5) /* binomial proportions to be detected */
groupweights = (1 1) /* (w1, w2):Weight of two groups,(r1,r2)*/
sides = 2 /* 1: one sided 2: 2: two-sided */
power =0.9 /* power */
alpha =0.05 /* alpha */
ntotal=.; /* sample size */
run;
The POWER Procedure
Pearson Chi-square Test for Two Proportions
Fixed Scenario Elements
Distribution | Asymptotic normal |
Method | Normal approximation |
Number of Sides | 2 |
Alpha | 0.05 |
Group 1 Proportion | 0.4 |
Group 2 Proportion | 0.5 |
Group 1 Weight | 1 |
Group 2 Weight | 1 |
Nominal Power | 0.9 |
Null Proportion Difference | 0 |
Computed N Total
Actual Power |
N Total |
0.901 | 1038 |
In an observational study, patients are not randomly assigned to treatment groups with equal probability. Instead, the probability of assignment varies from patient to patient, possibly depending on the patient’s baseline covariates. This often results in non-comparable treatment groups due to imbalance of the baseline covariates and consequently invalidates the standard methods commonly employed in data analysis. To overcome this problem, Rosenbaum and Rubin (1983, 1984) developed the propensity score method.
Suppose that there are j strata formed by propensity score stratification. Let n denote the total sample size and nj the sample size in stratum j ∑Jj=1nj=n
Group | |||
Response | 1 | 2 | Total |
Yes | xj11 | xj12 | xj1 |
No | xj21 | xj22 | xj2 |
Total | nj1 | nj2 | nj |
Let Oj=xj11
Vj=nj1nj2xj1xj2n2j(nj-1).
Then, the weighted Mantel-Haenszel (WMH) test is given by
T=∑Jj=1ˆwj(Oj-Ej)√∑Jj=1ˆw2jVj,
where the weights ˆwj
Let aj = nj/n denote the allocation proportion for stratum (∑Jj=1aj=1)
In order to calculate the power of WMH, we have to derive the asymptotic distribution of ∑Jj=1ˆwj(Oj-Ej)
Oj-Ej=nj1nj2nj(ˆpj1-ˆpj2)
=nj1nj2nj(ˆpj1-pj1-ˆpj2+pj2)+nj1nj2nj(pj1-pj2)
=najbj1bj2(ˆpj1-pj1-ˆpj2+pj2)+najbj1bj2(pj1-pj2).
Thus, under H1, ∑Jj=1ˆwj(Oj-Ej)
Δ=∑Jj=1wjajbj1bj2(pj1-pj2)
=(1-Φ)∑Jj=1wjajbj1bj2pj1qj1qj1+Φpj1
and
σ21=n-1∑Jj=1w2jn2j1n2j2n2j(pj1qj1nj1+pj2qj2nj2)
=∑Jj=1w2jajbj1bj2(bj2pj1qj1+bj1pj2qj2).
Also under H1, we have
J∑j=1w2jVj=nσ20+op(n),
where
σ20=J∑j=1w2jajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).
Hence, the power of WMH is given as
1-β= P (|T|>z1-α/2|H1)
= P (σ1σ0Z+√n|Δ|σ0>z1-α/2)
=ˉΦ(σ0σ1z1-α/2-√n|Δ|σ1),
where Z is a standard normal random variable and ˉΦ(z)=P(Z>z)
n=(σ0z1-α/2+σ1z1-β)2Δ2.
Following Jung, Chow, and Chi (2007), the sample size calculation for the weighted Mantel-Haenszel test can be carried out as follows:
1. Specify the input variables
- Type I and II error probabilities (α β).
- Success probabilities for group 1 p11,...pJ1, and the odds ratio Φ under H1. Note that pj2=Φpj1/(qj1+Φpj1)
- Incidence rates for the strata, (aj, j = 1,...,J). (Yue [2007] proposes to use aj≈1/J
- Allocation probability for group 1 within each stratum, (bj1,j=1,...,J)
2. Calculate n by
n=(σ0z1-α/2+σ1z1-β)2Δ2,
where
Δ=∑Jj=1ajbj1bj2(pj1-pj2)
σ21=∑Jj=1ajbj1bj2(bj2pj1qj1+bj1pj2qj2)
σ20=∑Jj=1ajbj1bj2(bj1pj1+bj2pj2)(bj1qj1+bj2qj2).
Suppose that we want to compare the response probabilities between control (k = 1) (and experimental (k = 2) (groups. We consider partitioning the combined data into J = 5 strata, and the allocation proportions are projected as and (a1,a2,a3,a4,a5)=(.15,.15,.2,.25,.25)
%macro WMHTestwithStrata ( J = , /* number of strata */
inA= , /* incidence rates for the strata */
inB= , /* allocation probability for control group */
inP1=, /* success probability for control group */
phi =, /* odds ratio under H1 */
power=, /* power */
alpha=, /* alpha */
sides= /* 1: one-sided test 2: Two-sided test */
);
proc iml;
%let K = 2; /* two groups */
A=&inA; B =&inB;P1=&inP1;
P2 =J(&J,1,0);Q1 =J(&J,1,0); Q2 =J(&J,1,0);
do j=1 to &J;
Q1[j]=1-P1[j];
P2[j]=&phi*P1[j]/(Q1[j]+&phi*P1[j]);
Q2[j]=1-P2[j];
end;
z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
z_p2=&power; z_beta = probit(z_p2);
delta = 0; s0_sq= 0; s1_sq= 0;
do j=1 to &J;
delta = delta+A[j]*B[j]*(1-B[j])*(P1[j]-P2[j]);
s1_sq= s1_sq+A[j]*B[j]*(1-B[j])*((1-B[j])*P1[j]*Q1[j]+B[j]*P2[j]*Q2[j]);
s0_sq= s0_sq+A[j]*B[j]*(1-B[j])*(B[j]*P1[j]+(1-B[j])*P2[j])*(B[j]*Q1[j]+(1-B[j])*Q2[j]);
end;
n = (1/(delta**2))*((sqrt(s0_sq)*z_alpha+sqrt(s1_sq)*z_beta)**2);
print 'Sample Size Calculation';
print 'Weighted Mantel-Haenszel Test with Strata';
alpha = α power = &power; phi= φ sides = &Sides;
print alpha power phi sides ;
print A B P1 P2;
print delta s0_sq s1_sq;
print n;
quit;
run;
%mend WMHTestwithStrata;
/*---------------- Run the macro for Example 18.4. ---------------------*/
%WMHTestwithStrata(
J = 5 , /* number of strata */
inA= %str({0.15, 0.15, 0.2, 0.25, 0.25}), /*Incidence Rates for Strata*/
inB= %str({0.4 , 0.4, 0.5, 0.6, 0.6}), /* Allocation Probability */
inP1=%str({0.5 , 0.6, 0.7, 0.8, 0.9}), /* Success Probability */
phi = 2, /* odds Ratio Under H1 */
power =0.8, /* Power */
alpha = 0.05, /* Alpha */
sides = 2 /* Two-sided test */
);
Sample Size Calculation
Weighted Mantel-Haenszel Test with Strata
alpha 0.05 |
power 0.8 |
phi 2 |
sides 2 |
A | B | P1 | P2 |
0.15 | 0.4 | 0.5 | 0.6666667 |
0.15 | 0.4 | 0.6 | 0.75 |
0.2 | 0.5 | 0.7 | 0.8235294 |
0.25 | 0.6 | 0.8 | 0.8888889 |
0.25 | 0.6 | 0.9 | 0.9473684 |
delta | s0_sq | s1_sq |
-0.025752 | 0.0381275 | 0.0367178 |
n
446.21501
18.4 Two-Sample Log-Rank Test for Survival Data
Suppose that nk subjects are accrued to a study from group k(=1,2). For subject i(=1,...,nk) in group k(=1,2), let Tki denote the survival variable (that is, time to an event of interest), with marginal cumulative hazard function Λk(t)
H0:Λ1(t)=Λ2(t) for all t≥0
against the alternative hypothesis,
Ha:Λ1(t)≠Λ2(t) for some t≥0.
Because a cumulative hazard function uniquely determines the distribution, H0 implies that (T1i,i=1,...,n1)
Due to loss to follow up or termination of study before all subjects experience events, survival times are censored for some subjects. Let Cki denote the censoring time for the subject i in group k. Then, we observe {(Xki,δki),1≤i≤nk,k=1,2}
The log-rank test (Peto and Peto, 1972) is given by
W=∫∞0Y1(t)Y2(t)Y1(t)+Y2(t){dˆΛ1(t)-dˆΛ2(t)},
where ˆΛk(t)=∫t0Yk(s)-1dNk(s)
ˆσ2=∫∞0Y1(t)Y2(t){Y1(t)+Y2(t)}{dN1(t)+dN2(t)},
Refer to Fleming and Harrington (1991), for example. We reject H0 if the absolute value of W/ˆΣ
For sample size and power calculation, we assume a proportional hazards model Δ=λ2(t)/λ1(t)
(logΔz1-α/2+z1-β)2=D-11+D-12.
In order to calculate Dk, we need to specify the survival distributions under Ha and the common censoring distribution. Suppose that the subjects are accrued at a constant rate for a period of A and followed for an additional period of B after the completion of accrual. The total study period is A + B. Then, assuming no loss to follow up, the censoring distribution is uniform between B and A + B, as in the following:
G(t)= P (Cki≥t)={1 if t<B1+(B-t)/A if B≤t≤A+B0 otherwise.
One of the most popular survival models in sample size calculation is the exponential distribution,
Sk(t)= P (Tki≥t)=exp(-λkt)
for group k. Then, the probability that a subject in arm k experiences an event is given as
dk= P (Tki≤Ci)=∫∞0Sk(t)dG(t)=1-exp(-λkB){1-exp(-λkA)}Aλk,
so that we have Dk=nkdk=nrkdk
n=(1r1d1+1r2d2)(z1-α/2+z1-βlogΔ)2.
We assumed no loss to follow up in calculating dk, but this assumption can be easily loosened by assuming a distribution for the time to loss to follow up.
In summary, a sample size is calculated as follows:
• In addition to type I error probability α and power 1 − β, specify
(a) λ1,λ2=
(b) rk = the prevalence of group k (r1 + r2 = 1)
(c) A = accrual period, B = additional follow-up period
• Calculate the probability of an event for a subject in group k(=1,2)
dk=1-exp(-λkB){1-exp(-λkA)}Aλk,
• Calculate the sample size
n=(1r1d1+1r2d2)(z1-α/2+z1-βlogΔ)2.
Suppose that the control group (k = 1) has a median survival time of μ1=3
%macro SS_TwoSmpleLogRank(
Accrual =, /* accrual period */
Follow =, /* additional follow-up period */
inR =, /* group allocation proportion(a1, a2) */
inLambda=, /* hazard rates under the alternative */
alpha =, /* alpha */
power =, /* power */
sides = /* 1: One-sided test 2: Two-sided test */
);
proc iml;
%let K = 2; r = &inR; Group = J(&K,1,0); lambda = &inLambda;
delta = lambda[2]/lambda[1]; d_prob = J(&K, 1,0);
n = J(&K, 1,0); D = J(&K, 1,0);
do i=1 to &K;
group[i] = i;
d_prob[i]=1-(exp(-lambda[i]*&Follow))*(1-exp(-lambda[i]*&Accrual))
/(&Accrual*lambda[i]);
end;
z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
z_p2=&power; z_beta = probit(z_p2);
Total =
int((1/(r[1]*d_prob[1])+1/(r[2]*d_prob[2]))*((z_alpha+z_beta)/log(delta))**
2)+1;
do i=1 to &K; n[i]= r[i]*Total; D[i]= d_prob[i]*n[i]; end;
print ' Sample Size Calculation';
print 'for Two-Sample Log-Rank Test for Survival Data';
Accrual=&Accrual; Follow_Up= &Follow;
print 'Accrual Period : ' &Accrual;
print ' Follow Up period : ' &Follow;
print Group lambda delta r d_prob;
print Group n D;
print Total;
quit;
run;
%mend SS_TwoSmpleLogRank;
/*-------------- Run the macro for Example 18.5.1 ----------------------------*/
%SS_TwoSmpleLogRank(
Accrual =3, /* accrual period */
Follow =2, /* additional follow-up period */
inR =%str({0.3, 0.7}), /* group allocation proportion(r1, r2) */
inLambda=%str({0.231, 0.154}),/* hazard rates under the alternative */
alpha =0.05, /* alpha */
power =0.9, /* power */
sides =2 /* two-sided test */
);
Sample Size Calculation
for Two-Sample Log-Rank Test for Survival Data
Accrual Period : | 3 |
Follow Up period : | 2 |
GROUP | LAMBDA | DELTA | R | D_PROB |
1 | 0.231 | 0.6666667 | 0.3 | 0.5455053 |
2 | 0.154 | 0.7 | 0.411467 |
GROUP | N | D |
1 | 183.9 | 100.31843 |
2 | 429.1 | 176.56049 |
TOTAL
613
proc power;
twosamplesurvival test=logrank
nfractional
groupmedsurvtimes = 3 | 4.5
accrualtime = 3
followuptime = 2
groupweights = (0.3,0.7)
alpha = 0.05
sides=2
power = 0.9
ntotal = .
;
run;
The POWER Procedure
Log-Rank Test for Two Survival Curves
Fixed Scenario Elements
Method | Lakatos normal approximation |
Form of Survival Curve 1 | Exponential |
Form of Survival Curve 2 | Exponential |
Number of Sides | 2 |
Accrual Time | 3 |
Follow-up Time | 2 |
Alpha | 0.05 |
Group 1 Median Survival Time | 3 |
Group 2 Median Survival Time | 4.5 |
Group 1 Weight | 0.3 |
Group 2 Weight | 0.7 |
Nominal Power | 0.9 |
Number of Time Sub-Intervals | 12 |
Group 1 Loss Exponential Hazard | 0 |
Group 2 Loss Exponential Hazard | 0 |
Computed Ceiling N Total
Fractional N Total |
Actual Power |
Ceiling N Total |
592.792138 | 0.900 | 593 |
18.5 Two-Sample Longitudinal Data
In a longitudinal study, we measure the outcome of research interest repeatedly from each subject over a time period. When there are two groups of subjects, one popular primary objective is the between-group comparison of change rate in the expected value of a response variable over time. Typically, the repeated measurements within each subject are correlated. Because of the robustness to possible misspecification of the correlation structure among the repeated measurements, the generalized estimating equation (GEE) method has been one of the most popular methods to fit the regression models and to test on the change rates (Liang and Zeger, 1986). In this section, we discuss sample size estimation methods for such testing. We consider cases where the response variable is continuous or dichotomous.
Suppose that there are nk subjects in treatment group k(=1,2), n1,n2 = n. Let, for group k, Nk=∑nki=1mki
g(μkij)=ak+bktkij,
where g(.) is a known link function. In other words, we assume that there exists a link function g(.) linearizing the trajectory in response over time. To simplify the discussions, we use the identity link g(μ)=μ
The coefficient bk represents the change rate per unit time in mean response if y is continuous and the change rate in log-odds if y is binary. The measurement times may vary subject by subject due to missing measurements, patients’ visits for measurements at unscheduled times, loss to follow up, or other causes. In this chapter, we assume that any missing data is missing completely at random (Rubin, 1976).
The repeated measurements (ykij,1≤j≤mki)
Uk(a,b)=1√nk∑nki=1∑mkij=1{ykij-μkij(a,b)}(1tkij)
and μkij(a,b)=g-1(a+btkij)
In the continuous outcome variable case, we have a closed form solution to Uk (a,b) = 0
ˆbk=∑nki=1∑mkij=1(tkij-ˉtk)ykij∑nki=1∑mkij=1(tkij-ˉtk)2
and ˆak=ˉyk-ˆbkˉtk
By Liang and Zeger (1986), as n→∞
ˆvk=∑nki=1{∑mkij=1(tkij-ˉtk)ˆɛkij}2{∑nki=1∑mkij=1(tkij-ˉtk)2}2
where ˆɛkij=ykij-ˆak-ˆbktkij
Let pkij=μkij
(ˆa(l)kˆb(l)k)=(ˆa(l-1)kˆb(l-1)k)+n-1/2kA-1k(ˆa(l-1)k,ˆb(l-1)k)Uk(ˆa(l-1)k,ˆb(l-1)k),
where
Ak(a,b)=-n-1/2k∂Uk(a,b)∂(a,b)=1nk∑nki=1∑mkij=1pkij(a,b)qkij(a,b)(1tkijtkijt2kij)
pkij(a,b)=g-1(a+btkij)=ea+btkij1+ea+btkij (5)
and qkij(a,b)=1-pkij(a,b).
By Liang and Zeger (1986), for large n, √nk(ˆak-ak,ˆbk-bk)T is asymptotically normal with mean 0 and variance Vk that can be consistently estimated by
ˆVk=A-1k(ˆak,ˆbk)ˆΣkA-1k(ˆak,ˆbk)
where
ˆΣk=1nknk∑i=1{mki∑j=1ˆɛkij(1tkij)}⊗2,
ˆɛkij=ykij-pkij(ˆak,ˆbk) and c⊗2=ccT for a vector . Let ˆvk be the (2,2)-component of ˆVk.
Suppose that we want to test the rate of change between two groups (that is, H0: b1 = b2) . Based on the asymptotic results from the previous section, we can reject H0: b1 = b2 in favor of H0: b1 ≠ b2 when
|ˆb1-ˆb2√ˆv1/n1+ˆv2/n2|>z1-α/2, (6)
Where Z1−α/2 is the 100(1−α/2) percentile of a standard normal distribution.
In this section, we derive a sample size formula for the two-sided α test (6) to detect H1:|b2-b1|=Δ(>0) with power 1−β. When designing a study, we usually schedule fixed visit times t1<⋯<tmfor m repeated measurements from each subject. We often set t1 = 0 for the baseline measurement time. When the study is conducted, however, the subjects may skip some visit times due to various reasons, which results in missing values, or they may not follow the visit schedule correctly so that the observed visit times may be variable. Jung and Ahn (2003) show through simulations in a continuous outcome variable case that the sample size formula based on fixed measurement times is very accurate even when the observed measurement times are widely distributed around the scheduled times.
By simple algebra, we can derive a sample size formula to detect the specified difference|b2-b1|=Δ with power 1−β,
n=(z1-α/2+z1-β)2(v1/r1+v2/r2)Δ2, (7)
where vk=limn→∞ˆvk. The expression of vk is slightly different between continuous and binary outcome variable cases, as shown in the following subsections.
In order to allow for missing values, let δj denote the proportion of patients with observations at tj and δj the proportion of patients with observations at both tj and tj’. Also, let . ρjj′= corr (ykij,ykij′)We assume that the missing pattern and correlation structure are common in two treatment groups. We discuss specific models for missing pattern and correlation structure in Section 18.5.3.
We assume that the continuous outcome variable has a constant variance var (ykij)=σ2over time, which is common between two treatment groups. Jung and Ahn (2003) show that is v1 = v2 = v expressed as
v=σ2(s2+c)s4,
where
s2=m∑j=1δj(tj-τ)2
and
τ=∑mj=1δjtj∑mj=1δj.
Hence, (7) is simplified to
n=v(z1-α/2+z1-β)2Δ2r1r2. (8)
Note that we do not have to specify the true values for {(ak,bk),k=1,2}but only the difference |b2-b1|=Δin sample size calculation. The sample size is proportional to the variance of measurement error and decreases as r1 approaches 1/2.
Let Pkj = ak + bktk denote the success probabilities under H1. Jung and Ahn (2005) show that
vk=s2k+cks4k,
where
s2k=m∑j=1δjpkjqkj(tj-τk)2
and
τk=∑mj=1δjpkjqkjtj∑mj=1δjpkjqkj.
As shown here, the sample size formula under the binary outcome variable case depends on the probabilities (Pkj, j = 1, . . . , m) under H1, so that we need to specify all regression parameters {(ak,bk),k=1,2}. Let Δ = b1−b2 denote the difference in slope between two groups under H1. If pilot data exist for the control group (group 1), then we may use the estimates as the parameter values, a1 and b1. If t = 0 is the baseline, the intercepts in the two groups are set the same (that is, a1 = a2) . By setting b2 = b1 + Δ, we can specify all regression coefficients under H1.
If no pilot data are available, we may specify the binary probabilities at the baseline, p11, and at the end of follow up, p1m , for the control group. Then we obtain (a1b1) by
b1=g(p1m)-g(p11)tm-t1, (9)
a1=g(p11)-b1t1=g(p11).
And we set a2 = a1(since p11 = p21 at the baseline) and b2 = b1 + Δ.
Calculation of (or) requires projection of the missing probabilities and the true correlation structure.
In order to specify δjj, we need to estimate the missing pattern. If missing at time tj is independent of missing at time tj for each patient, then we have δjj’ = δjj, and we call this type independent missing.
In some studies, subjects missing at a measurement time may be missing at all following measurement times, as in the labor pain study discussed in Example 18.6. This type of missing is called monotone missing. In this case, we have δjj’ = δj for (j<j′(δ1≥⋯≥δm)). In monotone missing cases, one may want to specify the proportion of patients who will contribute exactly the first j observations, say £j. Then, noting that ηj=Δj-Δj+1for j = 1,...,m−1 and μm = δm , we can obtain δj recursively starting from δm. Note also that ∑jηj=Δ1 , which equals 1 if all patients have measurements at the first measurement time t1.
In summary, we consider two missing patterns as candidates to the true one: (a) independent missing, where δjj′=δjδj′ , and (b) monotone missing, where δjj′=δj′ for j<j′ (δ1≥⋯≥δm ).
Now, we specify the true correlation structure ρjj′ . A reasonable model for ɛij=yij-g-1(a+btij) may be
ɛij=ui+eij,
where ui is a subject-specific error term with variance σ2u and eij is a serially correlated within-subject error term with variance σ2e and correlation coefficients ˜ρjj′. Assuming that ui and eij are independent, we have
cov (ɛij,ɛij′)=σ2a+σ2e˜ρjj′.
Often, the variation between subjects (σ2u) is much larger than that within subjects (σ2e). In this case, we have
cov (ɛij,ɛij′)≈σ2u
and an exchangeable correlation structure, ρjj′=ρ for j≠j′ , may be a reasonable approximation to the true one.
On the other hand, if the variation within the subject (σ2e ) dominates over the variation between subjects (σ2u ), then we will have
cov (ɛij,ɛij′)≈σ2e˜ρjj′,
so that a serial correlation structure may be a reasonable approximation to the true one. One of the most popular serial correlation structures, especially when measurement times are not equidistant, is a continuous autocorrelation model with order 1, AR(1), for which ρjj′=ρ|tj-tj′| .
We consider two correlation structures as candidate approximations to the true one: (i) exchangeable, where ρjj′=ρ , and (ii) AR(1), where ρjj′=ρ|tj-tj′| .
For sample size calculation, the following parameters need to be specified commonly in both continuous and discrete outcome variable cases:
• Type I and II errors α and β, respectively.
• The size of difference to detect, |b2-b1|=Δ .
• Allocation proportion for group k, rk.
• Correlation structure and the associated correlation parameter ρ, (that is, (i) ρjj′=ρ for exchangeable or (ii) ρjj′=ρ|tj-tj′| for AR(1)).
• Proportion of patients with an observation at tj, δj.
• Missing pattern: (a) independent missing (δjj′=δjδj′ ) or (b) monotone missing (δjj′=δj′ for j<j′ ) .
In addition, we need to specify σ2= var (ɛij) in the continuous outcome variable case and the regression coefficients {(ak,bk),k=1,2} under H1 in the binary outcome variable case.
We demonstrate our sample size formula with real longitudinal studies.
In a study on labor pain (Davis, 1991), 83 women in labor were assigned to either a pain medication group (43 women) or a placebo group (40 women). At 30-minute intervals, the self-reported amount of pain was marked on a 100mm line, where 0 = no pain and 100 = extreme pain. The maximum number of measurements for each woman was , but there were numerous missing values at later measurement times with monotone missing pattern. A simple approach to such a study objective might be to estimate and compare the slopes of pain scores over time in the two treatment groups. In this study, the outcome variable is continuous. Suppose that we want to design a new study on labor pain based on the data reported by Davis (1991). As in the original study, we assume monotone missing. In this study, the measurement times were equispaced, so that we set tj = j − 1 (j = 1,…6) for convenience. From the data, we obtained σ2 = 815.84. Suppose we want to detect a difference of σ in mean pain score between two groups at t6. So, we project Δ = σ/(t6 − t1 = 5.71 in a new study. We consider a balanced design (that is, r1 = r2 = 1/2) . Also, from the data, the proportion of observed measurements are
(δ1,δ2,δ3,δ4,δ5,δ6)=(1,.90,.78,.67,.54,.41).
From these results τ = 2.02, we obtain and s2 = 11.42.
Suppose that we want to detect a difference of Δ = 5.71 with 80% of power (z1−β = 0.84) using a two-sided α = .05 (z1−α/2 = 1.96) test. Under an exchangeable correlation structure, we obtain ρ = .64 and c = −3.13 from the data, so that we have v = 815.84×(11.42−3.13)/11.422 = 51.84. Hence, from (8), the required sample size is calculated as
n=[51.84×(1.96+0.84)25.712]+1=50.
where [x] is the largest integer not exceeding x.
Under AR(1), we obtain from the data ρ = .80 and c = 2.31, so that the required sample size for detecting Δ = 5.71 with 80% of power using a two-sided α = .05 test is given as n = 83. The sample size under AR(1) is larger than that under an exchangeable correlation, by about 66%, in this example.
%macro SS_RepeatedContinuousMeasurement(
missingPattern = , /* 1: independent missing 2: monotone missing */
corrStructure = , /* 1: compound symetric , 2: AR(1) */
rho = , /* associated correlation parameter */
m = , /* number of measurement time points */
sigma_sq = , /* variance */
inR = , /* group allocation proportion(r1, r2) */
inDelta = , /* proportion of observed measurements */
alpha = , /* alpha */
power = , /* power */
sides = , /* 1: one-sided test 2: two-sided test */
print = 0 /* 0:default 1: detail */
);
proc iml;
%let K = 2; r = &inR; delta = &inDelta; t = J(&m,1,0);
do j=1 to &m; t[j] = j-1; end;
%if &inDelta eq %then %do; delta=J(&m,1,0);do j=1 to &m; delta[j]=1-(j-1)/20; end; %end;
d = sqrt(&sigma_sq)/(t[&m]-t[1]);
start g(p); gp = log(p/(1-p)); return (gp); finish g;
start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
start rho(i,j,r,c);
if c=1 then do; /* CS */ if i=j then rho_ij =1;else rho_ij = r; end;
else do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
return (rho_ij);
finish rho;
tau_num =0; tau_denum = 0;
do j=1 to &m; tau_num = tau_num+delta[j]*t[j]; tau_denum = tau_denum+delta[j]; end;
tau= tau_num/tau_denum; s_sq=0;
do j=1 to &m; s_sq = s_sq+delta[j]*((t[j]-tau)**2); end;
c=0;
do i=1 to &m;
do j=1 to &m;
if i ^= j then do;
if &missingPattern = 1 then do; delta_ij = delta[i]*delta[j]; end;
else do; if j > i then max_ij=j; else max_ij=i; delta_ij=delta[max_ij]; end;
c = c+delta_ij*rho(i,j,&rho,&corrStructure)*(t[i]-tau)*(t[j]-tau);
end;
end;
end;
v = &sigma_sq*(s_sq+c)/(s_sq**2);
z_p1=1-&alpha/&sides; z_alpha = probit(z_p1); z_p2=&power; z_beta = probit(z_p2);
n = int((v*(z_alpha+z_beta)**2)/(d**2*r[1]*r[2]))+1;
print 'Sample Size Calculation for a Two-Group Comparision';
print ' of Repeated Continous Measurements';
alpha =α power=&power; rho=ρ sides=&sides;
sigma_sq=&sigma_sq;
print alpha power rho sides;
if &missingPattern = 1 then do;print ' Missing Pattern: Independent '; end;
else if &missingPattern = 2 then do; print ' Missing Pattern: Monotone '; end;
if &corrStructure=1 then do; print ' Correlation Structure : Compound Symetric ';end;
else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) '; end;
%if &print = 1 %then %do; print delta; print tau s_sq c v ; %end;
print d sigma_sq;
print n;
quit;
run;
%mend SS_RepeatedContinuousMeasurement;
/*--Run the macro for Example 18.6 when Correlation Structure = Compound Symetric --*/
%SS_RepeatedContinuousMeasurement(
missingPattern = 2, /* monotone missing */
corrStructure = 1, /* compound symetric */
rho = 0.64, /* associated correlation parameter */
m = 6, /* number of measurement time points */
sigma_sq = 815.84, /* variance */
inR = %str({0.5, 0.5}), /* group allocation proportion(r1, r2) */
inDelta = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
/* proportion of observed measurements */
alpha = 0.05, /* alpha */
power = 0.8, /* power */
sides = 2, /* two-sided test */
print = 1
);
/*--- Run the macro for Example 18.6 when Correlation Structure = AR(1) ---------*/
%SS_RepeatedContinuousMeasurement(
missingPattern = 2, /* monotone missing */
corrStructure = 2, /* AR(1) */
rho = 0.8, /* associated correlation parameter */
m = 6, /* number of measurement time points */
sigma_sq = 815.84, /* variance */
inR = %str({0.5, 0.5}),/* group allocation proportion(r1, r2) */
inDelta = %str({1, 0.9, 0.78, 0.67, 0.54, 0.41}),
/* proportion of observed measurements */
alpha = 0.05, /* alpha */
power = 0.8, /* power */
sides = 2, /* two-sided test */
print = 1
);
/*------------------ When Correlation Structure = Compound Symetric --------*/
Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements
ALPHA | POWER | RHO | SIDES |
0.05 | 0.8 | 0.64 | 2 |
Missing Pattern: Monotone
Correlation Structure : Compound Symetric
TAU | S_SQ | C | V |
2.0186047 | 11.418512 | -3.133345 | 51.842656 |
D | SIGMA_SQ |
5.7125826 | 815.84 |
N
50
/*--------------------- When Correlation Structure = AR(1) -----------------*/
Sample Size Calculation for a Two-Group Comparision
of Repeated Continous Measurements
ALPHA | POWER | RHO | SIDES |
0.05 | 0.8 | 0.8 | 2 |
Missing Pattern: Monotone
Correlation Structure : AR(1)
TAU | S_SQ | C | V |
2.0186047 | 11.418512 | 2.3055968 | 85.87567 |
D | SIGMA_SQ |
5.7125826 | 815.84 |
N
83
The Genetics vs. Environment In Scleroderma Outcome Study (GENISOS) is an observational study designed as a collaboration of the University of Texas-Houston Health Science Center with the University of Texas Medical Branch at Galveston and the University of Texas-San Antonio Health Science Center (Reveille et al., 2001). Scleroderma, or systemic sclerosis, is a multisystem disease of unknown etiology characterized by cutaneous and visceral fibrosis, small blood vessel damage, and autoimmune features (Medsger, 1997). The study subjects are regularly followed to check for the occurrence of pulmonary fibrosis. In this case, the outcome is a binary variable.
Suppose that we want to develop a study to examine the effect of a new drug in preventing the occurrence of pulmonary fibrosis in subjects with scleroderma compared to no intervention. The parameter values specified here for sample size calculation are approximated by the estimates from the current data set of GENISOS.
We want to estimate the sample size for the new study using α = .05 and 1−β = .8 . As in GENISOS, presence or absence of pulmonary fibrosis is assessed at baseline and at months 6, 12, 18, 24, and 30. Because the measurement times are equidistant, we set tj = j − 1 j ( j = 1,…,6 ) for the m = 6 time points. The within-group correlation structure of the repeated measurements conforms to AR(1) with the adjacent correlation equal to ρ = .8 (that is, ρjj′=.8|j-j′| ). We consider assigning an equal number of scleroderma patients in each of two groups (that is, r1 = r2 =1/2).
Approximately 75% of scleroderma patients do not have pulmonary fibrosis at the baseline in the ongoing GENISOS. We project that the proportion of subjects without pulmonary fibrosis is 75% at baseline (that is, p1,1 = .75) and 50% at 30 months (that is, p1,j = .50) in a placebo group. We assume that a new therapy will prevent or delay further occurrence of pulmonary fibrosis. That is, the proportion of subjects without pulmonary fibrosis will remain 75% during the 30-month study in a new therapy group (in other words, p2,1 = p2,6 = .75 ). From these values and (9), we obtain
b1=g(.5)-g(.75)5-0=-0.220,
and a1 = g(.75) = 1.099. Similarly, we obtain (a2, b2)=(1.099, 0), so that Δ = 0 − (−0.220) = 0.220. By (5), the probabilities of no pulmonary fibrosis at the 6 time points are obtained as (.750,.707,.659,.608,.555,.500) for the placebo group and (.750,.750,.750,.750,.750,.750) for the treatment group.
The proportions of observed measurements are expected to be
(δ1,δ2,δ3,δ4,δ5,δ6)=(1,.95,.90,.85,.80,.75).
Suppose that we expect independent missing. Then, we obtain δjj′ from the specified δj values using δjj′=δjδj′ . Now we have all the parameter values required, and we obtain v1 = 0.305 and v1 = 0.353. Finally, from (7), we obtain
n=[(1.96+0.84)2(0.305/0.5+0.353/0.5)0.2202]+1=215.
If we assume monotone missing ( δjj′=δj∨j′ ), we obtain v1 = 0.324 , , v2 = 0.308 and n = 229 .
%macro SS_RepeatedBinaryMeasurement(
missingPattern = , /* 1: independent missing 2: monotone missing */
corrStructure = , /* 1: compound symetric , 2: AR(1) */
rho = , /* associated correlation parameter */
m = , /* number of measurement time points */
inP1 = , /* proportion of subjects for control group */
inP2 = , /* proportion of subjects */
inR = , /* group allocation proportion(r1, r2) */
inDelta = , /* proportion of observed measurements */
alpha = , /* alpha */
power = , /* power */
sides = /* 1: one-sided test 2: two-sided test */
);
proc iml;
start g(p); gp = log(p/(1-p)); return (gp); finish g;
start prob(a,b,t); p = 1/(1+exp(-a-b*t)); return (p); finish prob;
start rho(i,j,r,c);
if c=1 then do; /* CS */ if i=j then rho_ij =1; else rho_ij = r; end;
else if c = 2 then do; /* AR(1) */ dist = abs(i-j); rho_ij = r**dist; end;
return (rho_ij);
finish rho;
%let K = 2;
P1=&inP1; P2=&inP2; r =&inR; a=J(&K,1,0); b=J(&K,1,0); t=J(&m,1,0);
tau = J(&K,1,0); s_sq = J(&K,1,0); v = J(&K,1,0); c_sq = J(&K,1,0);
do j=1 to &m; t[j] = j-1; end;
%if &inDelta eq %then %do;
delta = J(&m,1,0);
do j=1 to &m; delta[j] = 1-(j-1)/20; end;
%end;
b[1] = (g(P1[&m])-g(P1[1]))/(t[&m]-t[1]); a[1] = g(P1[1])-B[1]*t[1];
b[2] = (g(P2[&m])-g(P2[1]))/(t[&m]-t[1]); a[2] = g(P2[1])-B[1]*t[1];
d = b[2]-b[1];
do j=1 to &m;
T[j] = j-1; P1[j]= prob(a[1], b[1], t[j]); P2[j]= prob(a[2], b[2], t[j]);
end;
tau1_num =0; tau1_denum = 0; tau2_num =0; tau2_denum = 0;
do j=1 to &m;
tau1_num = tau1_num+delta[j]*P1[j]*(1-P1[j])*t[j];
tau1_denum = tau1_denum+delta[j]*P1[j]*(1-P1[j]);
tau2_num = tau2_num+delta[j]*P2[j]*(1-P2[j])*t[j];
tau2_denum = tau2_denum+delta[j]*P2[j]*(1-P2[j]);
end;
tau[1]= tau1_num/tau1_denum; tau[2]= tau2_num/tau2_denum;
s1=0;s2=0;
do j=1 to &m;
s1 = s1+delta[j]*P1[j]*(1-P1[j])*((t[j]-tau[1])**2);
s2 = s2+delta[j]*P2[j]*(1-P2[j])*((t[j]-tau[2])**2);
end;
s_sq[1]=s1; s_sq[2]=s2; c1=0; c2=0; c12=0;
do i=1 to &m;
do j=1 to &m;
if i ^= j then do;
if &missingPattern = 1 then do;
delta_ij = delta[i]*delta[j];
end;else do;
if j > i then max_ij = j; else max_ij = i; delta_ij = delta[max_ij];
end;
c1=c1+delta_ij*rho(i,j,&rho,&corrStructure)*
sqrt(P1[i]*(1-P1[i])*P1[j]*(1-P1[j]))*((t[i]-tau[1])*(t[j]-tau[1]));
c2=c2+delta_ij*rho(i,j,&rho,&corrStructure)*
sqrt(P2[i]*(1-P2[i])*P2[j]*(1-P2[j]))*((t[i]-tau[2])*(t[j]-tau[2]));
end;
end;
end;
c_sq[1]=c1; c_sq[2]=c2;
do k =1 to &K; v[k] = (s_sq[k]+c_sq[k])/(s_sq[k]**2); end;
z_p1=1-&alpha/&sides; z_alpha = probit(z_p1);
z_p2=&power; z_beta = probit(z_p2);
n = int((z_alpha+z_beta)**2*(v[1]/r[1]+v[2]/r[2])/d**2)+1;
print 'Sample Size Calculation for a Two-Group Comparision';
print ' of Repeated Binary Measurements';
alpha =α power=&power; rho=ρ sides=&sides;
print alpha power rho sides;
if &missingPattern = 1 then do; print ' Missing Pattern: Independent '; end;
else if &missingPattern = 2 then do;print ' Missing Pattern: Monotone '; end;
if &corrStructure = 1 then do; print ' Correlation Structure : Compound Symetric ';
end; else if &corrStructure = 2 then do; print ' Correlation Structure : AR(1) ';
end;
print P1 P2 delta v ; print d; print n;
quit;
run;
%mend SS_RepeatedBinaryMeasurement;
/*--------------- Run the macro for Example 18.7 ------------------------------*/
%SS_RepeatedBinaryMeasurement(
missingPattern = 1, /* 1: independent missing 2: monotone missing */
corrStructure = 2, /* 1: compound symetric , 2: AR(1) */
rho = 0.8, /* associated correlation parameter */
m = 6, /* number of measurement time points */
inP1 = %str({0.75, 0, 0, 0, 0, 0.5}),
/* proportion of subjects for control group */
inP2 = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}),
/* proportion of subjects */
inR = %str({0.5, 0.5}), /* group allocation proportion(r1, r2) */
inDelta = , /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
alpha = 0.05, /* alpha */
power = 0.8, /* power */
sides = 2 /* 1: one-sided test 2: two-sided test */
);
%SS_RepeatedBinaryMeasurement(
missingPattern = 2, /* monotone missing */
corrStructure = 2, /* AR(1) */
rho = 0.8, /* associated correlation parameter */
m = 6, /* number of measurement time points */
inP1 = %str({0.75, 0, 0, 0, 0, 0.5}),
/* proportion of subjects for control group */
inP2 = %str({0.75, 0.75, 0.75, 0.75, 0.75, 0.75}),
/* proportion of subjects */
inR = %str({0.5, 0.5}), /* group allocation proportion(r1, r2) */
inDelta = , /* delta[j] = 1-(j-1)/20, proportion of observed measurements */
alpha = 0.05, /* alpha */
power = 0.8, /* power */
sides = 2 /* 1: one-sided test 2: two-sided test */
);
/*--------------When Missing Pattern Is Independent Missing ----------------*/
Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements
ALPHA | POWER | RHO | SIDES |
0.05 | 0.8 | 0.8 | 2 |
Missing Pattern: Independent
Correlation Structure : AR(1)
P1 | P2 | DELTA | V |
0.75 | 0.75 | 1 | 0.3048798 |
0.7065921 | 0.75 | 0.95 | 0.3534175 |
0.6590733 | 0.75 | 0.9 | |
0.6081268 | 0.75 | 0.85 | |
0.5547107 | 0.75 | 0.8 | |
0.5 | 0.75 | 0.75 |
D
0.2197225
N
215
/*-------------- When Missing Pattern Is Monotone Missing ------------------*/
Sample Size Calculation for a Two-Group Comparision
of Repeated Binary Measurements
ALPHA | POWER | RHO | SIDES |
0.05 | 0.8 | 0.8 | 2 |
Missing Pattern: Monotone
Correlation Structure : AR(1)
P1 | P2 | DELTA | V |
0.75 | 0.75 | 1 | 0.3236844 |
0.7065921 | 0.75 | 0.95 | 0.3804059 |
0.6590733 | 0.75 | 0.9 | |
0.6081268 | 0.75 | 0.85 | |
0.5547107 | 0.75 | 0.8 | |
0.5 | 0.75 | 0.75 |
D
0.2197225
N
229
In designing an observational study, we should choose a statistical testing method that will be used when analyzing the final data. For an accurate sample size calculation, the sample size formula should be directly derived from the power function of the chosen testing method.
There are two types of parameters included in sample size formulas: primary and nuisance parameters. The primary parameters are those shown in the statistical hypotheses (for example, means, binomial proportions, and hazard rates), and the nuisance parameters (for example, prevalence rates, accrual period in survival data, correlation coefficients, and missing type and probabilities in longitudinal data) are those of no or secondary interest. Because both types of parameters determine the final sample size, it is important to specify the parameter values as accurately as possible. If a database exists, we usually estimate the values of nuisance parameters and the primary parameter value of the control group from the database. Although rk may be chosen by the prevalence of a disease, we may decide to use different allocation proportions from the prevalence rates in the natural population depending on the relative cost to accrue subjects in two groups. The primary parameter value for the case group is usually chosen based on the clinical significance of the associated intervention on the outcome variable, which is measured in terms of the difference or ratio of the primary parameter values between the two groups. When there is uncertainty in some of these parameters, we may conduct a sensitivity analysis on the power to demonstrate that the power would be adequate under a range of scenarios.
Appendix: Asymptotic Distribution of Wilcoxon Rank Sum Test under Hα
Let ˜Xki=Xki-θk. Then ˜X11,…,˜X1n1,˜X21,…,˜X2n2 are IID with cumulative density function (CDF) F(x). Noting that
W=1n1n2n1∑i=1n2∑j=1I(˜X1i>˜X2j-Δ),
we have W=∫∞-∞ˆF2(x+Δ)dˆF1(x), where ˆFk(x)=n-1k∑nki=1I(˜Xki≤x) is the empirical CDF of ˜Xk1,…,˜Xknk that uniformly converges to F(x) as nk→∞ . Let
νa= P (˜X1i>˜X2j-Δ)=∫∞-∞F(x+Δ)dF(x) . For a large n, W-νa is expressed as
∫∞-∞ˆF2(x+Δ)dˆF1(x)-∫∞-∞F(x+Δ)dF(x)
=∫∞-∞F(x+Δ)d{ˆF1(x)-F(x)}+∫∞-∞{ˆF2(x+Δ)-F(x+Δ)}dF(x)+op(n-1),
where op(n-1)=∫∞-∞{ˆF2(x+Δ)-F(x+δ)}d{ˆF1(x)-F(x)} is negligible for a large n.
Because
∫∞-∞F(x+Δ)ˆF1(x)=1n1n1∑i=1F(˜X1i+Δ)
and
∫∞-∞F(x+Δ)ˆF1(x)=1n1n1∑i=1F(˜X1i+Δ)
we have var (W)=σ21/n1+σ22/n2, where
σ21= var {F(˜X1i+Δ)}
σ22= var {ˉF(˜X2i-Δ)}= var {F(˜X2i-Δ)}.
Aalen, O. 1978. “Nonparametric inference for a family of counting processes.” Annals of Statistics 6: 701–726.
Carroll, R. J., and D. Ruppert. 1988. Transformation and Weighting in Regression. New York: Chapman and Hall/CRC.
Davis, C. S. 1991. “Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials.” Statistics in Medicine 10: 1959–1980.
Fleming, T. R., and D. P. Harrington. 1991. Counting Processes and Survival Analysis. New York: John Wiley & Sons, Inc.
Gart, J. J. 1985. “Approximate tests and interval estimation of the common relative risk in the combination of 2 × 2 tables.” Biometrika >72: 673–677.
Jung, S. H., and C. Ahn. 2003. “Sample size estimation for GEE method for comparing slopes in repeated measurements data.” Statistics in Medicine >22: 1305–1315.
Jung, S. H., and C. W. Ahn. 2005. “Sample size for a two-group comparison of repeated binary measurements using GEE.” Statistics in Medicine 24(17): 2583–2596.
Jung, S. H., S. C. Chow, and E. Chi. 2007. “A note on sample size calculation based on propensity analysis in non-randomized trials.” Journal of Biopharmaceutical Statistics >17: 35–41.
Lakatos, E. 1988. “Sample sizes based on the log-rank statistic in complex clinical trials.” Biometrics 44: 229–241.
Liang, K. Y., and S. L. Zeger. 1986. “Longitudinal data analysis using generalized linear models.” Biometrika >73: 13–22.
Mann, H. B., and D. R. Whitney. 1947. “On a test of whether one of two random variables is stochastically larger than the other.” The Annals of Mathematical Statistics 18: 50-60.
Medsger, Jr., T. A. 1997. “Systemic sclerosis (scleroderma): clinical aspects. In Arthritis and allied conditions: a textbook of rheumatology. 13th ed. Edited by W. J. Koopman. Baltimore, MD: Williams & Wilkins, pp 1433–1464.
Nelson, W. 1969. “Hazard plotting for incomplete failure data.” Journal of Quality Technology 1(1): 27–52.
Peto, R., and J. Peto. 1972. “Asymptotically efficient rank invariant test procedures (with discussion).” Journal of the Royal Statistical Society, Series A: General 135: 185–206.
Reveille, J. D., M. Fischbach, T. McNearney, et al. 2001. “Systemic sclerosis in 3 US ethnic groups: a comparison of clinical, sociodemographic, serologic, and immunogenetic determinants.” Seminars in Arthritis and Rheumatism >30(5): 332–346.
Rosenbaum, P. R., and D. B. Rubin. 1983. “The central role of the propensity score in observational studies for causal effects.” Biometrika >70: 41–55.
Rosenbaum, P. R., and D. B. Rubin. 1984. “Reducing bias in observational studies using subclassification on the propensity score.” Journal of the American Statistical Association 79: 516–524.
Rubin, D. B. 1976. “Inference and missing data.” Biometrika >63: 581–590.
Rubinstein, L. V., M. H. Gail, and T. J. Santner. 1981. “Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation.” Journal of Chronic Diseases >34(9/10): 469–479.
SAS Institute Inc. 2009. SAS/STAT9.2 User’s Guide, Second Edition. Cary, NC: SAS Institute Inc.
Yue, L. Q. 2007. “Statistical and regulatory issues with the application of propensity score analysis to non-randomized medical device clinical studies.” Journal of Biopharmaceutical Statistics 17: 1–13.