This vignette illustrates the main functionalities of the
**projpred** package, which implements the projection
predictive variable selection for various regression models (see section
“Supported types of models” below for more
details on supported model types). What is special about the projection
predictive variable selection is that it not only performs a variable
selection, but also allows for valid post-selection inference.

The projection predictive variable selection is based on the ideas of
Goutis and Robert (1998) and
Dupuis and Robert (2003).
The methods implemented in **projpred** are described in
detail in Piironen, Paasiniemi, and Vehtari (2020), Catalina,
Bürkner, and Vehtari (2022), Weber and
Vehtari (2023), and Catalina, Bürkner, and Vehtari (2021).
A comparison to many other methods may also be found in Piironen and Vehtari (2017a). For details on how to cite
**projpred**, see the projpred
citation info on CRAN^{1}.

For this vignette, we use **projpred**’s
`df_gaussian`

data. It contains 100 observations of 20
continuous predictor variables `X1`

, …, `X20`

(originally stored in a sub-matrix; we turn them into separate columns
below) and one continuous response variable `y`

.

```
data("df_gaussian", package = "projpred")
<- data.frame(y = df_gaussian$y, df_gaussian$x) dat_gauss
```

First, we have to construct a reference model for the projection predictive variable selection. This model is considered as the best (“reference”) solution to the prediction task. The aim of the projection predictive variable selection is to find a subset of a set of candidate predictors which is as small as possible but achieves a predictive performance as close as possible to that of the reference model.

Usually (and this is also the case in this vignette), the reference
model will be an **rstanarm** or **brms**
fit. To our knowledge, **rstanarm** and
**brms** are currently the only packages for which a
`get_refmodel()`

method (which establishes the compatibility
with **projpred**) exists. Creating a reference model
object via one of these `get_refmodel.stanreg()`

or
`brms::get_refmodel.brmsfit()`

methods (either implicitly by
a call to a top-level function such as `project()`

,
`varsel()`

, and `cv_varsel()`

, as done below, or
explicitly by a call to `get_refmodel()`

) leads to a
“typical” reference model object. In that case, all candidate models are
actual *sub*models of the reference model. In general, however,
this assumption is not necessary for a projection predictive variable
selection (see, e.g., Piironen,
Paasiniemi, and Vehtari 2020). This is why “custom” (i.e.,
non-“typical”) reference model objects allow to avoid this assumption
(although the candidate models of a “custom” reference model object will
still be actual *sub*models of the full `formula`

used
by the search procedure—which does not have to be the same as the
reference model’s `formula`

, if the reference model possesses
a `formula`

at all). Such “custom” reference model objects
can be constructed via `init_refmodel()`

(or
`get_refmodel.default()`

), as shown in section “Examples” of
the `?init_refmodel`

help^{2}.

Here, we use the **rstanarm** package to fit the
reference model. If you want to use the **brms** package,
simply replace the **rstanarm** fit (of class
`stanreg`

) in all the code below by your
**brms** fit (of class `brmsfit`

). Only note
that in case of a **brms** fit, we recommend to specify
argument `brms_seed`

of
`brms::get_refmodel.brmsfit()`

.

`library(rstanarm)`

For our **rstanarm** reference model, we use the
Gaussian distribution as the `family`

for our response. With
respect to the predictors, we only include the linear main effects of
all 20 predictor variables. Compared to the more complex types of
reference models supported by **projpred** (see section “Supported types of models” below), this is a quite
simple reference model which is sufficient, however, to demonstrate the
interplay of **projpred**’s functions.

We use **rstanarm**’s default priors in our reference
model, except for the regression coefficients for which we use a
regularized horseshoe prior (Piironen and
Vehtari 2017c) with the hyperprior for its global shrinkage
parameter following Piironen and Vehtari (2017b) and Piironen and Vehtari (2017c). In R code, these are the
preparation steps for the regularized horseshoe prior:

```
# Number of regression coefficients:
<- sum(grepl("^X", names(dat_gauss))) ) ( D
```

`[1] 20`

```
# Prior guess for the number of relevant (i.e., non-zero) regression
# coefficients:
<- 5
p0 # Number of observations:
<- nrow(dat_gauss)
N # Hyperprior scale for tau, the global shrinkage parameter (note that for the
# Gaussian family, 'rstanarm' will automatically scale this by the residual
# standard deviation):
<- p0 / (D - p0) * 1 / sqrt(N) tau0
```

We now fit the reference model to the data. To make this vignette
build faster, we use only 2 MCMC chains and 500 iterations per chain
(with half of them being discarded as warmup draws). In practice, 4
chains and 2000 iterations per chain are reasonable defaults.
Furthermore, we make use of **rstan**’s
parallelization, which means to run each chain on a separate CPU core^{3}. If you
run the following code yourself, you can either rely on an automatic
mechanism to detect the number of CPU cores (like the
`parallel::detectCores()`

function shown below) or adapt
`ncores`

manually to your system.

```
# Set this manually if desired:
<- parallel::detectCores(logical = FALSE)
ncores ### Only for technical reasons in this vignette (you can omit this when running
### the code yourself):
<- min(ncores, 2L)
ncores ###
options(mc.cores = ncores)
<- stan_glm(
refm_fit ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 +
y + X16 + X17 + X18 + X19 + X20,
X15 family = gaussian(),
data = dat_gauss,
prior = hs(global_scale = tau0),
### Only for the sake of speed (not recommended in general):
chains = 2, iter = 500,
###
seed = 2052109, QR = TRUE, refresh = 0
)
```

Usually, we would now have to check the convergence diagnostics (see,
e.g., `?posterior::diagnostics`

and
`?posterior::default_convergence_measures`

). However, due to
the technical reasons for which we reduced `chains`

and
`iter`

, we skip this step here.

Now, **projpred** comes into play.

`library(projpred)`

In **projpred**, the projection predictive variable
selection consists of a *search* part and an *evaluation*
part. The search part determines the solution path, i.e., the best
submodel for each submodel size (the size is the number of predictor
terms). The evaluation part determines the predictive performance of the
submodels along the solution path.

There are two functions for performing the variable selection:
`varsel()`

and `cv_varsel()`

. In contrast to
`varsel()`

, `cv_varsel()`

performs a
cross-validation (CV) by running the search part with the training data
of each CV fold separately (an exception is
`validate_search = FALSE`

, see `?cv_varsel`

and
below) and running the evaluation part on the corresponding test set of
each CV fold. Because of this CV, `cv_varsel()`

is
recommended over `varsel()`

. Thus, we use
`cv_varsel()`

here. Nonetheless, running
`varsel()`

first can offer a rough idea of the performance of
the submodels (after projecting the reference model onto them). A more
principled **projpred** workflow is work under
progress.

Here, we use only some of the available arguments; see the
documentation of `cv_varsel()`

for the full list of
arguments. By default, `cv_varsel()`

runs a Pareto-smoothed
importance sampling (PSIS, see Vehtari, Gelman,
and Gabry 2017; Vehtari et al. 2022) leave-one-out (LOO)
CV (see argument `cv_method`

) which includes the search in
the CV (see argument `validate_search`

). Here, we set
argument `validate_search`

to `FALSE`

to obtain
rough preliminary results and make this vignette build faster. If
possible (in terms of computation time), we recommend using the default
of `validate_search = TRUE`

to avoid overfitting in the
selection of the submodel size. Here, we also set
`nclusters_pred`

to a low value of `20`

only to
speed up the building of the vignette. By modifying argument
`nterms_max`

, we impose a limit on the submodel size up to
which the search is continued. Typically, one has to run the variable
selection with a large `nterms_max`

first (the default value
may not even be large enough) and only after inspecting the results from
this first run, one is able to set a reasonable `nterms_max`

in subsequent runs. The value we are using here (`9`

) is
based on such a first run (which is not shown here, though).

```
<- cv_varsel(
cvvs
refm_fit,### Only for the sake of speed (not recommended in general):
validate_search = FALSE,
nclusters_pred = 20,
###
nterms_max = 9,
seed = 411183
)
```

The first step after running `varsel()`

or
`cv_varsel()`

should be the decision for a final submodel
size. This should be the first step (in particular, before inspecting
the solution path) in order to avoid a user-induced selection bias
(which could occur if the user made the submodel size decision dependent
on the solution path). To decide for a submodel size, there are several
performance statistics we can plot as a function of the submodel size.
Here, we use the mean log predictive density (MLPD; see argument
`stats`

of `summary.vsel()`

for details) and the
root mean squared error (RMSE). By default, the performance statistics
are plotted on their original scale, but with
`deltas = TRUE`

, they are calculated as differences from a
baseline model (which is the reference model by default, at least in the
most common cases). Since the differences are usually of more interest
(with regard to the submodel size decision), we directly plot with
`deltas = TRUE`

here (note that as
`validate_search = FALSE`

, this result is slightly
optimistic, and the plot looks different when
`validate_search = TRUE`

is used):

`plot(cvvs, stats = c("mlpd", "rmse"), deltas = TRUE, seed = 54548)`

Based on that plot (see `?plot.vsel`

for a description),
we would decide for a submodel size of 6 because that’s the point where
the performance measures level off and are close enough to the reference
model’s performance (note that since the plot is affected by
`validate_search = FALSE`

, this manual decision based on the
plot is affected, too):

`<- 6 modsize_decided `

The `suggest_size()`

function offered by
**projpred** may help in the decision for a submodel size,
but this is a rather heuristic method and needs to be interpreted with
caution (see `?suggest_size`

):

`suggest_size(cvvs, stat = "mlpd")`

`[1] 6`

```
# We are using the same seed as in the plot() call above to ensure that the
# bootstrap intervals are exactly the same:
suggest_size(cvvs, stat = "rmse", seed = 54548)
```

`[1] 6`

With both performance statistics, we would get the same final
submodel size (`6`

) as by our manual decision
(`suggest_size()`

is also affected by
`validate_search = FALSE`

).

Only now, after we have made a decision for the submodel size, we
inspect further results from the variable selection and, in particular,
the solution path. For example, we could simply `print()`

the
resulting `cvvs`

object, but to create the summary table
matching the predictive performance plot above (and to adjust the
minimum number of printed significant digits), we slightly adapt the
printing here:

```
print(summary(cvvs, stats = c("mlpd", "rmse"), deltas = TRUE, seed = 54548),
digits = 1)
```

```
Family: gaussian
Link function: identity
Formula: y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 +
X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20
Observations: 100
Projection method: traditional
CV method: PSIS-LOO CV with search not included (i.e., a full-data search only)
Search method: L1
Maximum submodel size for the search: 9
Number of projected draws in the search: 1 (from clustered projection)
Number of projected draws in the performance evaluation: 20 (from clustered projection)
Performance evaluation summary with `deltas = TRUE`:
size solution_terms mlpd.loo mlpd.se rmse.loo rmse.se
0 <NA> -1.021 0.08 1.886 0.16
1 X1 -0.835 0.08 1.389 0.15
2 X14 -0.580 0.07 0.842 0.12
3 X5 -0.494 0.08 0.686 0.12
4 X20 -0.306 0.06 0.388 0.09
5 X6 -0.217 0.06 0.264 0.08
6 X3 0.005 0.05 -0.003 0.05
7 X8 0.089 0.03 -0.091 0.03
8 X11 0.117 0.02 -0.121 0.02
9 X10 0.121 0.02 -0.123 0.02
```

The solution path can be seen in the `print()`

output
(column `solution_terms`

), but it is also accessible through
the `solution_terms()`

function:

`<- solution_terms(cvvs) ) ( soltrms `

`[1] "X1" "X14" "X5" "X20" "X6" "X3" "X8" "X11" "X10"`

Combining the decided submodel size of 6 with the solution path leads to the following terms (as well as the intercept) as the predictor terms of the final submodel:

`<- head(soltrms, modsize_decided) ) ( soltrms_final `

`[1] "X1" "X14" "X5" "X20" "X6" "X3" `

The `project()`

function returns an object of class
`projection`

which forms the basis for convenient
post-selection inference. By the following code, `project()`

will project the reference model onto the final submodel once again^{4}:

`<- project(refm_fit, solution_terms = soltrms_final) prj `

For more accurate results, we could have increased argument
`ndraws`

of `project()`

(up to the number of
posterior draws in the reference model). However, this increases the
runtime, which we don’t want in this vignette.

Next, we create a matrix containing the projected posterior draws
stored in the depths of `project()`

’s output:

`<- as.matrix(prj) prj_mat `

This matrix is all we need for post-selection inference. It can be
used like any matrix of draws from MCMC procedures, except that it
doesn’t reflect a typical posterior distribution, but rather a projected
posterior distribution, i.e., the distribution arising from the
deterministic projection of the reference model’s posterior distribution
onto the parameter space of the final submodel^{5}. Beware that in case
of clustered projection (i.e., a non-`NULL`

argument
`nclusters`

in the `project()`

call), the weights
of the clusters need to be taken into account when performing
post-selection (or, more generally, post-projection) inference.

The **posterior**
package provides a general way to deal with posterior distributions, so
it can also be applied to our projected posterior. For example, to
calculate summary statistics for the marginals of the projected
posterior:

```
library(posterior)
<- as_draws_matrix(prj_mat)
prj_drws <- summarize_draws(
prj_smmry
prj_drws,"median", "mad", function(x) quantile(x, probs = c(0.025, 0.975))
)# Coerce to a `data.frame` because pkgdown versions > 1.6.1 don't print the
# tibble correctly:
<- as.data.frame(prj_smmry)
prj_smmry print(prj_smmry, digits = 1)
```

```
variable median mad 2.5% 97.5%
1 (Intercept) 0.05 0.10 -0.2 0.2
2 X1 1.38 0.09 1.2 1.6
3 X14 -1.12 0.09 -1.3 -0.9
4 X5 -0.91 0.11 -1.1 -0.7
5 X20 -1.12 0.11 -1.3 -0.9
6 X6 0.54 0.10 0.3 0.7
7 X3 0.78 0.10 0.6 1.0
8 sigma 1.13 0.09 1.0 1.3
```

A visualization of the projected posterior can be achieved with the
**bayesplot**
package, for example using its `mcmc_intervals()`

function:

```
library(bayesplot)
bayesplot_theme_set(ggplot2::theme_bw())
mcmc_intervals(prj_mat) +
::coord_cartesian(xlim = c(-1.5, 1.6)) ggplot2
```

Note that we only visualize the *1-dimensional* marginals of
the projected posterior here. To gain a more complete picture, we would
have to visualize at least some *2-dimensional* marginals of the
projected posterior (i.e., marginals for pairs of parameters).

For comparison, consider the marginal posteriors of the corresponding parameters in the reference model:

```
<- as.matrix(refm_fit)
refm_mat mcmc_intervals(refm_mat, pars = colnames(prj_mat)) +
::coord_cartesian(xlim = c(-1.5, 1.6)) ggplot2
```

Here, the reference model’s marginal posteriors differ only slightly from the marginals of the projected posterior. This does not necessarily have to be the case.

Predictions from the final submodel can be made by
`proj_linpred()`

and `proj_predict()`

.

We start with `proj_linpred()`

. For example, suppose we
have the following new observations:

```
<- setNames(
( dat_gauss_new as.data.frame(replicate(length(soltrms_final), c(-1, 0, 1))),
soltrms_final ) )
```

```
X1 X14 X5 X20 X6 X3
1 -1 -1 -1 -1 -1 -1
2 0 0 0 0 0 0
3 1 1 1 1 1 1
```

Then `proj_linpred()`

can calculate the linear
predictors^{6} for all new observations from
`dat_gauss_new`

. Depending on argument
`integrated`

, these linear predictors can be averaged across
the projected draws (within each new observation). For instance, the
following computes the expected values of the new observations’
predictive distributions^{7}:

```
<- proj_linpred(prj, newdata = dat_gauss_new, integrated = TRUE)
prj_linpred cbind(dat_gauss_new, linpred = as.vector(prj_linpred$pred))
```

```
X1 X14 X5 X20 X6 X3 linpred
1 -1 -1 -1 -1 -1 -1 0.50053930
2 0 0 0 0 0 0 0.04160798
3 1 1 1 1 1 1 -0.41732334
```

If `dat_gauss_new`

also contained response values (i.e.,
`y`

values in this example), then `proj_linpred()`

would also evaluate the log predictive density at these (conditional on
each of the projected parameter draws if `integrated = FALSE`

and integrated over the projected parameter draws if
`integrated = TRUE`

).

With `proj_predict()`

, we can obtain draws from predictive
distributions based on the final submodel. In contrast to
`proj_linpred(<...>, integrated = FALSE)`

, this
encompasses not only the uncertainty arising from parameter estimation,
but also the uncertainty arising from the observation (or “sampling”)
model for the response^{8}. This is useful for what is usually termed
a posterior predictive check (PPC), but would have to be termed
something like a posterior-projection predictive check (PPPC) here:

```
<- proj_predict(prj, .seed = 762805)
prj_predict # Using the 'bayesplot' package:
ppc_dens_overlay(y = dat_gauss$y, yrep = prj_predict)
```

This PPPC shows that our final projection is able to generate predictions similar to the observed response values, which indicates that this model is reasonable, at least in this regard.

In principle, the projection predictive variable selection requires only little information about the form of the reference model. Although many aspects of the reference model coincide with those from the submodels if a “typical” reference model object is used, this does not need to be the case if a “custom” reference model object is used (see section “Reference model” above for the definition of “typical” and “custom” reference model objects). This explains why in general, the following remarks refer to the submodels and not to the reference model.

In the following and throughout **projpred**’s
documentation, the term “traditional projection” is used whenever the
projection type is neither “augmented-data” nor “latent” (see below for
a description of these).

Apart from the `gaussian()`

response family used in this
vignette, **projpred**’s traditional projection also
supports the `binomial()`

^{9} and the `poisson()`

family.

The families supported by **projpred**’s augmented-data
projection (Weber and Vehtari 2023) are
`binomial()`

^{10} ^{11}, `brms::cumulative()`

,
`rstanarm::stan_polr()`

fits, and
`brms::categorical()`

^{12} ^{13}. See `?extend_family`

(which
is called by `init_refmodel()`

) for an explanation how to
apply the augmented-data projection to “custom” reference models. For
“typical” reference models (i.e., those created by
`get_refmodel.stanreg()`

or
`brms::get_refmodel.brmsfit()`

), the augmented-data
projection is applied automatically if the family is supported by the
augmented-data projection and neither `binomial()`

nor
`brms::bernoulli()`

. For applying the augmented-data
projection to the `binomial()`

(or
`brms::bernoulli()`

) family, see `?extend_family`

as well as `?augdat_link_binom`

and
`?augdat_ilink_binom`

. Finally, we note that there are some
restrictions with respect to the augmented-data projection;
**projpred** will throw an informative error if a requested
feature is currently not supported for the augmented-data
projection.

The latent projection (Catalina, Bürkner,
and Vehtari 2021) is a quite general principle for extending
**projpred**’s traditional projection to more response
families. The latent projection is applied when setting argument
`latent`

of `extend_family()`

(which is called by
`init_refmodel()`

) to `TRUE`

. The families for
which full latent-projection functionality (in particular,
`resp_oscale = TRUE`

, i.e., post-processing on the original
response scale) is currently available are `binomial()`

^{14} ^{15},
`poisson()`

, `brms::cumulative()`

, and
`rstanarm::stan_polr()`

fits^{16}. For all other
families, you can try to use the latent projection (by setting
`latent = TRUE`

) and **projpred** should tell
you if any features are not available and how to make them available.
More details concerning the latent projection are given in the
corresponding latent-projection
vignette. Note that there are some restrictions with respect to the
latent projection; **projpred** will throw an informative
error if a requested feature is currently not supported for the latent
projection.

On the side of the predictors, **projpred** not only
supports linear main effects as shown in this vignette, but also
interactions, multilevel^{17}, and—as an experimental
feature—additive^{18} terms.

Transferring this vignette (which employs a “typical” reference
model) to such more complex problems is straightforward: Basically, only
the code for fitting the reference model via **rstanarm**
or **brms** needs to be adapted. The
**projpred** code stays almost the same. Only note that in
case of multilevel or additive reference models,
some **projpred** functions then have slightly different
options for a few arguments. See the documentation for details.

For example, to apply **projpred** to the
`VerbAgg`

dataset from the **lme4**
package, a corresponding multilevel reference model for the binary
response `r2`

could be created by the following code:

```
data("VerbAgg", package = "lme4")
<- stan_glmer(
refm_fit ~ btype + situ + mode + (btype + situ + mode | id),
r2 family = binomial(),
data = VerbAgg,
seed = 82616169, QR = TRUE, refresh = 0
)
```

As an example for an additive (non-multilevel) reference model,
consider the `lasrosas.corn`

dataset from the **agridat**
package. A corresponding reference model for the continuous response
`yield`

could be created by the following code (note that
`pp_check(refm_fit)`

gives a bad PPC in this case, so there’s
still room for improvement):

```
data("lasrosas.corn", package = "agridat")
# Convert `year` to a `factor` (this could also be solved by using
# `factor(year)` in the formula, but we avoid that here to put more emphasis on
# the demonstration of the smooth term):
$year <- as.factor(lasrosas.corn$year)
lasrosas.corn<- stan_gamm4(
refm_fit ~ year + topo + t2(nitro, bv),
yield family = gaussian(),
data = lasrosas.corn,
seed = 4919670, QR = TRUE, refresh = 0
)
```

As an example for an additive multilevel reference model, consider
the `gumpertz.pepper`

dataset from the
**agridat** package. A corresponding reference model for
the binary response `disease`

could be created by the
following code:

```
data("gumpertz.pepper", package = "agridat")
<- stan_gamm4(
refm_fit ~ field + leaf + s(water),
disease random = ~ (1 | row) + (1 | quadrat),
family = binomial(),
data = gumpertz.pepper,
seed = 14209013, QR = TRUE, refresh = 0
)
```

In case of multilevel models, **projpred** has two
global options that may be relevant for users:
`projpred.mlvl_pred_new`

and
`projpred.mlvl_proj_ref_new`

. These are explained in detail
in the general package documentation (available online
or by typing `?`projpred-package``

).

Sometimes, the ordering of the predictor terms in the solution path makes sense, but for an increasing submodel size, the performance measures of the submodels do not approach that of the reference model so that the submodels exhibit a predictive performance that stays worse than the reference model’s. There are different reasons that can explain this behavior (the following list might not be exhaustive, though):

- The reference model’s posterior may be so wide that the default
`ndraws_pred`

could be too small. Usually, this comes in combination with a difference in predictive performance which is comparatively small. Increasing`ndraws_pred`

should help, but it also increases the computational cost. Re-fitting the reference model and thereby ensuring a narrower posterior (usually by employing a stronger sparsifying prior) should have a similar effect. - For non-Gaussian models, the discrepancy may be due to the fact that the penalized iteratively reweighted least squares (PIRLS) algorithm might have convergence issues (Catalina, Bürkner, and Vehtari 2021). In this case, the latent-space approach by Catalina, Bürkner, and Vehtari (2021) might help, see also the latent-projection vignette linked above.

If you are using `varsel()`

, then the lack of CV in
`varsel()`

may lead to overconfident and overfitted results.
In this case, try running `cv_varsel()`

instead of
`varsel()`

(which you should in any case for your final
results).

For multilevel binomial models, the traditional projection may not
work properly and give suboptimal results, see #353 on GitHub
(the underlying issue is described in **lme4** issue #682). With
suboptimality of the results, we mean that in such cases, the relevance
of the group-level terms can be underestimated. According to the
simulation-based case study from #353, the
latent projection should be considered as a currently available
remedy.

For multilevel Poisson models, the traditional projection may take very long, see #353. According to the simulation-based case study from #353, the latent projection should be considered as a currently available remedy.

Finally, as illustrated in the Poisson example of the latent-projection vignette, the latent projection can be beneficial for non-multilevel models with a (non-Gaussian) family that is also supported by the traditional projection, at least in case of the Poisson family and L1 search.

For multilevel models, the augmented-data projection seems to suffer
from the same issue as the traditional projection for the binomial
family (see above), i.e., it may not work properly and give suboptimal
results, see #353 (the
underlying issue is probably similar to the one described in
**lme4** issue #682). With
suboptimality of the results, we mean that in such cases, the relevance
of the group-level terms can be underestimated. According to the
simulation-based case study from #353, the
latent projection should be considered as a remedy in such cases.

Catalina, Alejandro, Paul-Christian Bürkner, and Aki Vehtari. 2022.
“Projection Predictive Inference for Generalized Linear and
Additive Multilevel Models.” In *Proceedings of
The 25th International Conference on
Artificial Intelligence and Statistics*,
edited by Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera,
151:4446–61. Proceedings of Machine Learning Research.
PMLR. https://proceedings.mlr.press/v151/catalina22a.html.

Catalina, Alejandro, Paul Bürkner, and Aki Vehtari. 2021. “Latent
Space Projection Predictive Inference.” arXiv. https://doi.org/10.48550/arXiv.2109.04702.

Dupuis, Jérome A., and Christian P. Robert. 2003. “Variable
Selection in Qualitative Models via an Entropic Explanatory
Power.” *Journal of Statistical Planning and Inference*
111 (1–2): 77–94. https://doi.org/10.1016/S0378-3758(02)00286-0.

Goutis, Constantinos, and Christian P. Robert. 1998. “Model Choice
in Generalised Linear Models: A Bayesian Approach via
Kullback-Leibler Projections.” *Biometrika*
85 (1): 29–37.

Piironen, Juho, Markus Paasiniemi, and Aki Vehtari. 2020.
“Projective Inference in High-Dimensional Problems:
Prediction and Feature Selection.” *Electronic
Journal of Statistics* 14 (1): 2155–97. https://doi.org/10.1214/20-EJS1711.

Piironen, Juho, and Aki Vehtari. 2017a. “Comparison of
Bayesian Predictive Methods for Model Selection.”
*Statistics and Computing* 27 (3): 711–35. https://doi.org/10.1007/s11222-016-9649-y.

———. 2017b. “On the Hyperprior Choice for the Global Shrinkage
Parameter in the Horseshoe Prior.” In *Proceedings of the 20th
International Conference on Artificial
Intelligence and Statistics*, edited by Aarti
Singh and Jerry Zhu, 54:905–13. Proceedings of Machine Learning
Research. PMLR. https://proceedings.mlr.press/v54/piironen17a.html.

———. 2017c. “Sparsity Information and Regularization in the
Horseshoe and Other Shrinkage Priors.” *Electronic Journal of
Statistics* 11 (2): 5018–51. https://doi.org/10.1214/17-EJS1337SI.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical
Bayesian Model Evaluation Using Leave-One-Out
Cross-Validation and WAIC.” *Statistics and
Computing* 27 (5): 1413–32. https://doi.org/10.1007/s11222-016-9696-4.

Vehtari, Aki, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah
Gabry. 2022. “Pareto Smoothed Importance Sampling.”
arXiv. https://doi.org/10.48550/arXiv.1507.02646.

Weber, Frank, and Aki Vehtari. 2023. “Projection Predictive
Variable Selection for Discrete Response Families with Finite
Support.” arXiv. https://doi.org/10.48550/arXiv.2301.01660.

The citation information can be accessed offline by typing

`print(citation("projpred"), bibtex = TRUE)`

within R.↩︎We will cover custom reference models more deeply in a future vignette.↩︎

More generally, the number of chains is split up as evenly as possible among the number of CPU cores.↩︎

During the forward search, the reference model has already been projected onto all candidate models (this was where arguments

`ndraws`

and`nclusters`

of`cv_varsel()`

came into play). For the evaluation of the submodels along the solution path, the reference model has been projected onto these submodels again (this was where arguments`ndraws_pred`

and`nclusters_pred`

of`cv_varsel()`

came into play). In principle, one could use the re-projections from the evaluation part for post-selection inference, but due to a deficiency in the current implementation (see GitHub issue #168), we currently have to project once again.↩︎In general, this implies that projected regression coefficients do not reflect isolated effects of the predictors. For example, especially in case of highly correlated predictors, it is possible that projected regression coefficients “absorb” effects from predictors that have been excluded in the projection.↩︎

`proj_linpred()`

can also transform the linear predictor to response scale, but here, this is the same as the linear predictor scale (because of the identity link function).↩︎Beware that this statement is correct here because of the Gaussian family with the identity link function. For other families (which usually come in combination with a different link function), one would typically have to use

`transform = TRUE`

in order to make this statement correct.↩︎In case of the Gaussian family we are using here, the uncertainty arising from the observation model is the uncertainty due to the residual standard deviation.↩︎

Via

`brms::get_refmodel.brmsfit()`

, the`brms::bernoulli()`

family is supported as well.↩︎Currently, the augmented-data support for the

`binomial()`

family does not include binomial distributions with more than one trial. In such a case, a workaround is to de-aggregate the Bernoulli trials which belong to the same (aggregated) observation, i.e., to use a “long” dataset.↩︎Like the traditional projection, the augmented-data projection also supports the

`brms::bernoulli()`

family via`brms::get_refmodel.brmsfit()`

.↩︎For the augmented-data projection based on a “typical”

**brms**reference model,**brms**version 2.17.0 or later is needed.↩︎More

**brms**families might be supported in the future.↩︎Currently, the latent-projection support for the

`binomial()`

family does not include binomial distributions with more than one trial. In such a case, a workaround is to de-aggregate the Bernoulli trials which belong to the same (aggregated) observation, i.e., to use a “long” dataset.↩︎Like the traditional projection, the latent projection also supports the

`brms::bernoulli()`

family via`brms::get_refmodel.brmsfit()`

.↩︎For the latent projection based on a “typical”

**brms**reference model,**brms**version 2.19.0 or later is needed.↩︎Multilevel models are also known as

*hierarchical*models or models with*partially pooled*,*group-level*, or—in frequentist terms—*random*effects.↩︎Additive terms are also known as

*smooth*terms.↩︎