Treatment assignment can be accomplished through the original data
generation process, using `defData`

and `genData`

.
However, the functions `trtAssign`

and
`trtObserve`

provide more options to generate treatment
assignment.

Treatment assignment can simulate how the treatment allocation is made in a randomized study. Assignment to treatment groups can be (close to) balanced (as would occur in a block randomized trial); this balancing can be done without or without strata. Alternatively, the assignment can be left to chance without blocking; in this case, balance across treatment groups is not guaranteed, particularly with small sample sizes.

First, create the data definition:

```
def <- defData(varname = "male", dist = "binary",
formula = .5 , id="cid")
def <- defData(def, varname = "over65", dist = "binary",
formula = "-1.7 + .8*male", link="logit")
def <- defData(def, varname = "baseDBP", dist = "normal",
formula = 70, variance = 40)
dtstudy <- genData(330, def)
```

*Balanced treatment assignment, stratified by gender and age
category (not blood pressure)*

```
study1 <- trtAssign(dtstudy, n = 3, balanced = TRUE, strata = c("male", "over65"),
grpName = "rxGrp")
study1
```

```
## Key: <cid>
## cid male over65 baseDBP rxGrp
## <int> <int> <int> <num> <int>
## 1: 1 1 1 64.9 2
## 2: 2 1 0 67.0 2
## 3: 3 0 1 75.0 3
## 4: 4 0 0 70.9 2
## 5: 5 1 0 68.6 1
## ---
## 326: 326 1 0 71.0 3
## 327: 327 0 0 64.1 2
## 328: 328 1 0 65.4 3
## 329: 329 1 0 69.1 3
## 330: 330 0 0 75.9 2
```

*Balanced treatment assignment (without stratification)*

*Random (unbalanced) treatment assignment*

*Comparison of three treatment assignment mechanisms*

`defData`

It is also possible to generate treatment assignments directly in the
`defData`

and `genData`

process. In this example,
randomization is stratified by *gender* and *age*, and the
outcome *y* is effected by both of these factors as well as the
treatment assignment variable *rx*.

```
def <- defData(varname = "male", dist = "binary",
formula = .5 , id="cid")
def <- defData(def, varname = "over65", dist = "binary",
formula = "-1.7 + .8*male", link="logit")
def <- defData(def, varname = "rx", dist = "trtAssign",
formula = "1;1", variance = "male;over65")
def <- defData(def, varname = "y", dist = "normal",
formula = "20 + 5*male + 10*over65 + 10*rx", variance = 40)
dtstudy <- genData(330, def)
dtstudy
```

```
## Key: <cid>
## cid male over65 rx y
## <int> <int> <int> <int> <num>
## 1: 1 1 0 0 26.9
## 2: 2 1 0 1 36.6
## 3: 3 1 0 1 44.6
## 4: 4 1 0 1 37.2
## 5: 5 1 0 0 31.2
## ---
## 326: 326 0 0 0 28.4
## 327: 327 0 0 0 25.1
## 328: 328 1 0 0 29.8
## 329: 329 0 0 1 28.2
## 330: 330 1 0 0 22.3
```

Here are the counts and average outcomes for each *gender*,
*age*, and *treatment* combination:

```
## Key: <male, over65, rx>
## male over65 rx n avg
## <int> <int> <int> <int> <num>
## 1: 0 0 0 70 19.6
## 2: 0 0 1 70 30.2
## 3: 0 1 0 12 29.2
## 4: 0 1 1 11 42.4
## 5: 1 0 0 59 24.2
## 6: 1 0 1 58 35.1
## 7: 1 1 0 25 35.3
## 8: 1 1 1 25 46.0
```

If exposure or treatment is observed (rather than randomly assigned),
use `trtObserved`

to generate groups. There may be any number
of possible exposure or treatment groups, and the probability of
exposure to a specific level can depend on covariates already in the
data set. In this case, there are three exposure groups that vary by
gender and age:

```
formula1 <- c("-2 + 2*male - .5*over65", "-1 + 2*male + .5*over65")
dtExp <- trtObserve(dtstudy, formulas = formula1, logit.link = TRUE, grpName = "exposure")
```

Here are the exposure distributions by gender and age:

Here is a second case of three exposures where the exposure is
independent of any covariates. Note that specifying the formula as
`c(.35, .45)`

is the same as specifying it is
`c(.35, .45, .20)`

. Also, when referring to probabilities,
the identity link is used:

Stepped-wedge designs are a special class of cluster randomized trials where each cluster is observed in both treatment arms (as opposed to the classic parallel design where only some of the clusters receive the treatment). This is a special case of a cross-over design, where the cross-over is only in one direction: control (or pre-intervention) to intervention.

In this example, the data generating process looks like this:

\[Y_{ict} = \beta_0 + b_c + \beta_1 * t + \beta_2*X_{ct} + e_{ict}\]

where \(Y_{ict}\) is the outcome for individual \(i\) in cluster \(c\) in time period \(t\), \(b_c\) is a cluster-specific effect, \(X_{ct}\) is the intervention indicator that has a value 1 during periods where the cluster is under the intervention, and \(e_{ict}\) is the individual-level effect. Both \(b_c\) and \(e_{ict}\) are normally distributed with mean 0 and variances \(\sigma^2_{b}\) and \(\sigma^2_{e}\), respectively. \(\beta_1\) is the time trend, and \(\beta_2\) is the intervention effect.

We need to define the cluster-level variables (i.e. the cluster effect and the cluster size) as well as the individual specific outcome. In this case each cluster will have 15 individuals per period, and \(\sigma^2_b = 0.20\). In addition, \(\sigma^2_e = 1.75\).

```
library(simstudy)
library(ggplot2)
defc <- defData(varname = "ceffect", formula = 0, variance = 0.20,
dist = "normal", id = "cluster")
defc <- defData(defc, "m", formula = 15, dist = "nonrandom")
defa <- defDataAdd(varname = "Y",
formula = "0 + ceffect + 0.1*period + trt*1.5",
variance = 1.75, dist = "normal")
```

In this case, there will be 30 clusters and 24 time periods. With 15 individuals per cluster per period, there will be 360 observations for each cluster, and 10,800 in total. (There is no reason the cluster sizes need to be deterministic, but I just did that to simplify things a bit.)

Cluster-level intervention assignment is done after generating the
cluster-level and time-period data. The call to
`trtStepWedge`

includes 3 key arguments that specify the
number of waves, the length of each wave, and the period during which
the first clusters begin the intervention.

`nWaves`

indicates how many clusters share the same
starting period for the intervention. In this case, we have 5 waves,
with 6 clusters each. `startPer`

is the first period of the
first wave. The earliest starting period is 0, the first period. Here,
the first wave starts the intervention during period 4.
`lenWaves`

indicates the length between starting points for
each wave. Here, a length of 4 means that the starting points will be 4,
8, 12, 16, and 20.

Once the treatment assignments are made, the individual records are created and the outcome data are generated in the last step.

```
set.seed(608477)
dc <- genData(30, defc)
dp <- addPeriods(dc, 24, "cluster", timevarName = "t")
dp <- trtStepWedge(dp, "cluster", nWaves = 5, lenWaves = 4,
startPer = 4, grpName = "trt")
dd <- genCluster(dp, cLevelVar = "timeID", "m", "id")
dd <- addColumns(defa, dd)
dd
```

```
## Key: <id>
## cluster period ceffect m timeID startTrt trt id Y
## <int> <int> <num> <num> <int> <num> <num> <int> <num>
## 1: 1 0 0.6278 15 1 4 0 1 1.524
## 2: 1 0 0.6278 15 1 4 0 2 0.986
## 3: 1 0 0.6278 15 1 4 0 3 -0.123
## 4: 1 0 0.6278 15 1 4 0 4 2.090
## 5: 1 0 0.6278 15 1 4 0 5 -2.340
## ---
## 10796: 30 23 -0.0983 15 720 20 1 10796 1.917
## 10797: 30 23 -0.0983 15 720 20 1 10797 5.921
## 10798: 30 23 -0.0983 15 720 20 1 10798 4.118
## 10799: 30 23 -0.0983 15 720 20 1 10799 4.569
## 10800: 30 23 -0.0983 15 720 20 1 10800 3.656
```

```
dSum <- dd[, .(Y = mean(Y)), keyby = .(cluster, period, trt, startTrt)]
ggplot(data = dSum,
aes(x = period, y = Y, group = interaction(cluster, trt))) +
geom_line(aes(color = factor(trt))) +
facet_grid(factor(startTrt, labels = c(1 : 5)) ~ .) +
scale_x_continuous(breaks = seq(0, 23, by = 4), name = "week") +
scale_color_manual(values = c("#b8cce4", "#4e81ba")) +
theme(panel.grid = element_blank(),
legend.position = "none")
```