Power Estimation for Randomized Controlled Trials: clusterPower

Alexandria C. Sakrejda, Jon Moyer, Ken Kleinman


The clusterPower package calculates power using analytical and/or simulation approaches for a wide range of cluster-randomized trial designs:

  1. Simple two-arm comparison designs
  2. Difference in difference designs
  3. Individually randomized group treatment trials
  4. Stepped wedge designs
  5. Multi-arm trial designs

Cluster-randomized trials are a common clinical trial design, often even a required design for study validity. Investigators may choose cluster randomization design for convenience, for practical reasons, or from necessity. In cluster-randomized trials, subjects that are administratively grouped together are all randomized to the same study treatment. For example, all the patients of each clinic involved in a trial may be grouped within the same treatment or control arm. In this case, the clinic “cluster” is randomized to the treatment arm, which determines the treatment group of each participant who attends that clinic.

“Statistical power” is usually defined as the probability of rejecting the null hypothesis given that some particular alternative hypothesis is true. All trials must have accurate power and sample size calculations for practical, economic, and ethical reasons—it is ethically unjustifiable to randomize more subjects than necessary to yield sufficient power, and equally unjustifiable to expose subjects to the potential for harm in an underpowered trial.

However, for cluster-randomized trials, power calculations are complex, relying on approximations, and/or on more assumptions than simpler designs and closed-form solutions may not exist. Many available routines are crude approximations, and few are available in free software or in webapps for people without highly technical statistical training. This document introduces you to clusterPower’s basic set of tools, and shows you how to implement them with your trial design.

ICC considerations

Investigators should proceed cautiously when transferring intracluster correlation coefficient (ICC) from previous studies. This statistic, commonly denoted as \(\rho\), is—in general terms—a metric for assessing the impact of clustering in cluster randomized trials. The ICC represents the proportion of total variability due to cluster-level variability. The equation for the ICC is \(\sigma_{b}^2/(\sigma_{b}^2+\sigma^2)\) for normal outcomes. For other outcome distributions, e.g., for dichotomous outcomes, the definition of ICC may not be clear. Notably, extrapolating sample size or power estimates using the ICC for observed dichotomous outcomes may be particularly fraught (Eldridge, Ukoumunne, and Carlin (2009) and @Gatsonis (2017)). For this reason, we prefer to specify models using natural parameters such as the variance of the random effects. If you use the ICC for non-normal outcomes, make sure you understand how previous investigators determined their ICC values before applying their statistics in your calculations.

Outcome types

This package is focused on calculating effect and sample size-related parameters for cluster randomized trials in which two or more interventions are compared with respect to continuous, binary, or count (Poisson-distributed) outcomes. Incidence rate and time-to-event outcome variables are also potential outcome structures which may influence how effective sample size or cluster number is calculated. See Gao et al. (2015) for more details on methodological approaches for different outcome types.

Analytic vs. simulation approaches

This package has options to implement either analytic or simulation-based power estimation methods. Analytic solutions to power and sample size relationships exist in certain settings, including cases of equal cluster sizes and normally-distributed outcomes. As an example, for a two-arm study with a normal outcome:

\[\begin{equation} W=P(t_{d,f,\lambda}<t_{d,f}^{-1}(\frac{\alpha}{2}))+P(t_{df,\lambda}>t_{d,f}^{-1}(\frac{1-\alpha}{2})) \end{equation}\]

where \(df=2(N-1)\), \(\lambda=d/[2(1+(N-1)\rho))/MN]^{\frac{1}{2}}\), \(d=(\mu_{1}-\mu_{2})/\sigma\), \(\mu_{g}\) is the arm \(g\) mean, \(M\) is the number of study participants per cluster, and \(N\) is the number of clusters in each arm. Furthermore, \(t_{k,\lambda}\) is a variate from the noncentral \(t\) distribution with \(k\) degrees of freedom and noncentrality parameter \(\lambda\), so \(t_{k}^{-1}(a)\) is the \(a\)th quantile from the central \(t\) distribution with \(k\) degrees of freedom. The intracluster correlation coefficient (ICC) is the correlation between two subjects within any cluster, denoted \(\rho\) in the above equation. In general, as the number of clusters \(N\) increases, the analytic power approximation becomes more accurate. The approximation worsens slightly as \(M\) increases and the ICC \(\rho\) decreases.

(check definition of \(M\), also shown as “cluster sizes”)

When clusters differ in size or the study outcome is non-normal, power can instead be estimated by simulation. (Approximate analytic results may also exist.) This procedure entails simulating data under the alternative, evaluating whether the null hypothesis was rejected, and repeating. The proportion of rejections observed is an estimate of power. As opposed to the analytic approach shown in the previous paragraph, a basic algorithm for assessing power for a normal outcome across two arms using a \(K\)-fold simulation approach is as follows:

  1. Sample cluster sizes \(M_{i}\)
  2. Sample random effects \(b_{i}\)
  3. For each cluster \(i\):
  1. Perform data analysis, record whether null hypothesis was rejected.
  2. Repeat previous steps \(K\) times.

Power is estimated by \(\sum r_{k}/K\), where \(r_{k}=1\) if the null was rejected in iteration \(k\). The confidence interval (CI) for the simulated power approximation is calculated based on binomial distribution properties. The approach can be easily modified to accommodate Poisson or dichotomous outcomes, and to adjust for differing cluster sizes, which marks an advantage of resampling methodology over some analytical approaches.

Simple two-arm comparison designs

In a simple two-arm parallel design setting, our random effects model looks like this:

\[\begin{equation} y_{itj}= \beta_{0}+x_{ij}\beta_{1}+b_{i}+e_{ij} \end{equation}\]

where \(i\) indicates a cluster, \(j\) indicates a subject within the cluster, and \(x_{ij}=1\) if subject \(ij\) is in the treatment group and 0 if not. The error term \(e_{ij}\) is assumed to be distributed normal with mean = 0 and variance \(\sigma^2\), while \(b_{i}\), uncorrelated with \(e_{ij}\), is assumed to be normal with mean 0 and variance \(\sigma_b^{2}\). For assessing power by simulation, we determine whether \(H_0: \beta_1=0\) is rejected or not.

Difference in difference designs

In this design, all clusters are observed in a baseline period, then some clusters are randomized to a treatment. The design accommodates pre-existing differences between the arms and provides increased power. The difference in differences trial can be modeled as:

\[\begin{equation} y_{itj}= \beta_{0}+x_{itj}\beta_{1}+p_{itj}\beta_{2}+x_{itj}p_{itj}\beta_{3}+b_{i}+e_{ij} \end{equation}\]

where \(t\) indexes period (baseline or intervention) and \(x_{itj}=1\) if cluster \(i\) is in the treatment group and \(p_{itj}=1\) if time \(t\) is in the intervention period, and \(0\) otherwise. Notably, \(\beta_{3}\) is the difference in period effect in the treatment group relative to the non-treatment group. For assessing power by simulation, we determine whether \(H_0: \beta_3=0\) is rejected or not.

Individually randomized group treatment trials

In the other RCT designs discussed here, nesting of individuals within clusters should be established prior to randomization. However, clustering may also occur when randomization is individual but treatment is administered in groups, or there are effects associated with the clinicians offering treatment, i.e. groups of individuals are treated by the same physician. A two-arm individually randomized group treatment trial can be modeled as:

\[\begin{equation} y_{ij}= \beta_{0}+x_{ij}\beta_{1}+x_{ij}b_{i1}+e_{ij} \end{equation}\]

i.e., exactly as the simple two arm trial with different between-cluster variances per arm, except that the between-group variance in one arm may be 0, because there are no groups. \(x_{ij}\) is included as a treatment indicator. As in the parallel treatment group design, for assessing power by simulation, we determine whether \(H_0: \beta_1=0\) is rejected or not.

Stepped wedge designs

In an stepped wedge trial, all participants typically receive the intervention. However, participants in the treatment arm begin the intervention in groups at different times, with the intervention continued until the end of the study. The stepped wedge trial design is a type of cross-over trial in which clusters switch treatments in waves. Initially all the clusters recieve the same standard treatment, and at the end of the trial all of the clusters recieve the treatment of interest. More than one cluster can change treatments in a wave, but the order in which clusters change treatments is randomly determined. During each wave, the outcome of interest is assessed for all clusters.

(eqn) (see Hussey and Hughes 2007).

Multi-arm trial designs

Frequently investigators need to design trials with three, four, or more arms. For a thorough discussion of the benefits of using mutli-arm approaches, see Parmar, Carpenter, and Sydes (2014) and O’Dowd (2014). Consider the following linear model, where \(k\) represents the treatment arm \(k=1...k-1\):

\[\begin{equation} y_{ij}= \beta_{0}+x_{1ij}\beta_{1}+x_{2ij}\beta_{2}+...x_{(k-1)ij}\beta_{(k-1)}+b_{i}+e_{ij} \end{equation}\]

Here, \(x_{rij}\) is an indicator variable identifying whether subject \(ij\) is in treatment group \(r\) or not. For assessing power by simulation, we determine whether \(H_0: \beta_1= \beta_2= \cdots \beta_{k-1}=0\) is rejected or not, and we can also examine individual \(\beta\)s.


Eldridge, Sandra M, Obioha C Ukoumunne, and John B Carlin. 2009. “The Intra-Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions.” International Statistical Review 77 (3): 378–94.

Gao, Fei, Arul Earnest, David B. Matchar, Michael J. Campbell, and David Machin. 2015. “Sample Size Calculations for the Design of Cluster Randomized Trials: A Summary of Methodology.” Contemporary Clinical Trials 42: 41–50. https://doi.org/https://doi.org/10.1016/j.cct.2015.02.011.

Gatsonis, Constantine. 2017. Methods in Comparative Effectiveness Research. Boca Raton, FL: CRC Press LLC Chapman; Hall/CRC.

Hussey, Michael A, and James P Hughes. 2007. “Design and Analysis of Stepped Wedge Cluster Randomized Trials.” Contemporary Clinical Trials 28 (2): 182–91.

O’Dowd, Adrian. 2014. “Scientists Call for More Multi-Arm Clinical Trials to Speed up Approval of New Drugs.” British Medical Journal Publishing Group.

Parmar, Mahesh KB, James Carpenter, and Matthew R Sydes. 2014. “More Multiarm Randomised Trials of Superiority Are Needed.” The Lancet 384 (9940): 283–84.