If you read this from a place other than https://mc-stan.org/projpred/news/index.html, please consider switching to that website since it features better formatting and cross-linking.
Setting the new global option projpred.extra_verbose
to TRUE
will print out which submodel
projpred is currently projecting onto. Furthermore, if
method = "forward"
and verbose = TRUE
in
varsel()
or cv_varsel()
, this new option will
also make projpred print out which submodel has been
selected at those steps of the forward search for which a percentage is
printed (the percentage refers to the maximum submodel size that the
search is run up to). In general, however, we cannot recommend setting
this new global option to TRUE
for cv_varsel()
with cv_method = "LOO"
and
validate_search = TRUE
or for cv_varsel()
with
cv_method = "kfold"
(simply due to the amount of
information that will be printed, but also due to the progress bar which
will not work anymore as intended). (GitHub: #363; thanks to @jtimonen)
Enhanced verbose
output. In particular,
varsel()
is now more verbose, similarly to how
cv_varsel()
has already been for a long time. The
verbose
output for cv_varsel()
has also been
updated, with the aim to give users a better understanding of the
methodology behind projpred. (GitHub: #382)
Slightly improved the calculation of predictive variances to make them less prone to numerical inaccuracies. (GitHub: #199)
Improved computational efficiency by avoiding an unnecessary
final full-data performance evaluation (including costly re-projections
if refit_prj = TRUE
, which is the default for
non-datafit
reference models) in cv_varsel()
with validate_search = TRUE
or
cv_method = "kfold"
. (GitHub: #385)
Reduced dependencies. (GitHub: #388)
Argument digits
of print.vselsummary()
which used to be passed to an internal round()
call was
removed. Instead, digits
can now be passed to
print.data.frame()
via ...
, thereby
determining the minimum number of significant digits to be
printed. (GitHub: #389)
Although bad practice (in general), a reference model lacking an intercept can now be used within projpred. However, it will always be projected onto submodels which include an intercept. The reason is that even if the true intercept in the reference model is zero, this does not need to hold for the submodels. An informational message mentioning the projection onto intercept-including submodels is thrown when projpred encounters a reference model lacking an intercept. (GitHub: #96, #391)
In case of non-predictor arguments of s()
or
t2()
, projpred now throws an error. (This
had already been documented before, but a suitable error message was
missing.) (GitHub: #393, based on #156 and #269)
In case of the brms::categorical()
family (supported
since version 2.4.0), projpred now strips underscores
from response category names in as.matrix.projection()
output, as done by brms. (GitHub: #394)
L1 search now throws a warning if an interaction term is selected before all involved main effects have been selected. (GitHub: #395)
Documented that in multilevel (group-level) terms, function calls
on the right-hand side of the |
character (e.g.,
(1 | gr(group_variable))
, which is possible in
brms) are currently not allowed in
projpred. A corresponding error message has also been
added. (GitHub: #319)
Due to internal refactoring:
project()
’s output elements submodl
and
weights
have been renamed to outdmin
and
wdraws_prj
, respectively.varsel()
’s and cv_varsel()
’s output
element d_test
has been replaced with new output elements
type_test
and y_wobs_test
.Apart from project()
’s output element
wdraws_prj
, these elements are not meant to be accessed
manually, so changes are mentioned here only for the sake of
completeness. Output element wdraws_prj
of
project()
is only needed if project()
was used
for a clustered projection, which is not the default (and discouraged in
most applied cases, at least with a small number of clusters). Thus,
these renamings are breaking changes only in very rare cases.
print.vselsummary()
now also prints K
in case of K-fold CV.
The print.vselsummary()
output has been slightly
improved, e.g., adding a remark what “search included” or “search not
included” means.
print.vselsummary()
now also prints whether
deltas = TRUE
or deltas = FALSE
was
used.
Output element pct_solution_terms_cv
has now also
been added to vsel
objects returned by
varsel()
, but in that case, it is simply NULL
.
This (pct_solution_terms_cv
being NULL
) is now
also the case if validate_search = FALSE
was used in
cv_varsel()
.
Minor enhancements in the documentation.
Enhancements in the vignettes. In particular, section “Troubleshooting” of the main vignette has been revised.
If proj_predict()
is used with observation weights
that are not all equal to 1
, a warning is now thrown.
(GitHub: starts to address #402)
predict.refmodel()
to require newdata
to contain the response variable in case of a brms
reference model. This is similar to paul-buerkner/brms#1457, but
concerns predict.refmodel()
(paul-buerkner/brms#1457
referred to predictions from the submodels). In order to make
this predict.refmodel()
fix work, brms
version 2.19.0 or later is needed. (GitHub: #381)p_type
of project()
to be
incorrect in case of refit_prj = FALSE
,
!is.null(nclusters)
, and an object
of class
vsel
that was created with a non-clustered (thinned)
projection during the search phase. The fix comes with a slightly
different behavior of proj_predict()
for
datafit
s: It will not draw nresample_clusters
times from the posterior-projection predictive distribution (which is
based on the same single projected draw), but only once. (GitHub: #211,
#401)refit_prj = FALSE
after an L1 search), a new dataset
containing a character
predictor variable with only a
single unique value (or a new dataset containing a factor
predictor variable with a single level) used to cause an error. The case
of a character
(not factor
) predictor variable
with only a single unique value occurred, e.g., during the performance
evaluation in a LOO CV if a character
predictor got
selected into a fold’s solution path. The character
issue
existed from version 2.1.0 on (in earlier versions, however, there were
other issues which caused character
predictors to throw an
error). Now, all issues with respect to character
predictor
variables should be resolved. The issue with single-level
factor
predictor variables is resolved now as well.
(GitHub: #403)refit_prj = FALSE
after an L1 search), a new dataset
containing a factor
predictor with re-ordered levels
(compared to this same factor
in the original dataset) used
to lead to incorrect predictions. This bug existed at least from version
2.0.2 on (possibly even in earlier versions), but has been resolved now.
(GitHub: #403)factor
. This
issue existed at least from version 2.0.2 on (possibly even in earlier
versions), but should have only affected rstanarm
reference model fits (brms reference model fits were
only affected in case of a brms::brm()
call with
drop_unused_levels = FALSE
, which is not the default).
(GitHub: #403)refit_prj = FALSE
(which is the default only for
datafit
s, not for the reference model objects of class
refmodel
that are usually employed in practice) to lead to
incorrect predictions from the L1-searched submodels (which are
L1-penalized GLMs) if the solution path had a main effect ranked after
an interaction term. This bug existed at least from version 2.0.2 on
(possibly even in earlier versions). The mentioned submodel predictions
did not only affect the performance evaluation, but also the projected
dispersion parameter and the returned Kullback-Leibler divergence (and
the corresponding cross-entropy). (GitHub: #403)resp_oscale = TRUE
default in
summary.vsel()
) is that varsel()
and
cv_varsel()
no longer call suggest_size()
internally at the end. Thus, print()
-ing an object of class
vsel
no longer includes the suggested projection size in
the output (the stat
for this suggested size was fixed to
"elpd"
anyway, a fact that many users were probably not
aware of). (GitHub: #372)projpred.mlvl_pred_new
and
projpred.mlvl_proj_ref_new
. These are explained in detail
in the general package documentation (available online
or by typing ?`projpred-package`
). (GitHub: #379)family
(see
init_refmodel()
) has a non-identity link function: After
clustering the reference model’s posterior draws, we need to aggregate
(within a given cluster) the reference model’s fitted values which
already take the offsets into account instead of taking the offsets into
account after aggregating the fitted values which do not take
the offsets into account. This fix should affect results only in a very
slight manner. Due to projpred’s internal adjustment
for numerical stability when averaging a quantity across the draws
within a given cluster, this also changes the projected residual
standard deviations in Gaussian models in the order of
1e-10
. (GitHub: #374)plot.vsel()
and summary.vsel()
, the
default of alpha = 0.32
is replaced by
alpha = 2 * pnorm(-1)
(=
1 - diff(pnorm(c(-1, 1)))
, which is only
approximately 0.32) so that now, a normal-approximation
confidence interval with default alpha
stretches by exactly
one standard error on either side of the point estimate. Typically, this
changes results only slightly. In some cases, however, the new default
may lead to a different suggested size, explaining why this is regarded
as a major change. (GitHub: #371)ggplot2::aes_string()
is not
used anymore, thereby avoiding an occasional soft-deprecation warning
thrown by ggplot2 3.4.0. (GitHub: #367)ce
of
project()
. The reason for this change is that the former KL
divergence assumed the reference model’s family to be the same as the
submodel’s family, which does not need to be the case for custom
reference models. This should not be a user-facing change as users are
discouraged to make use of specific output elements (like the former
element kl
of objects of class projection
or
vsel
) directly. (GitHub: #369)family
of init_refmodel()
and
get_refmodel.default()
).get_refmodel()
and init_refmodel()
(thereby
also distinguishing more clearly between “typical” and “custom”
reference model objects) in (i) the description and several arguments of
get_refmodel()
and init_refmodel()
, (ii)
sections “Reference
model” and “Supported
types of models” of the vignette. (GitHub: #357, #359, #364, #365,
#366)validate_search = FALSE
case of
cv_varsel()
.search_terms
(at least in some instances), also
affecting the output of solution_terms(<vsel_object>)
in those cases. (GitHub: #360; thanks to @sor16)validate_search = FALSE
case of cv_varsel()
.
This bug was introduced in v2.2.0 (and existed up
to—including—v2.2.1).cv_varsel()
with
cv_method = "LOO"
(more precisely, only the LOO posterior
predictive expected values
<vsel_object>$summaries$ref$mu
were affected, not the
(pointwise) LOO log posterior predictive density values
<vsel_object>$summaries$ref$lppd
). (GitHub: #186
(partly), #356)cv_varsel()
with custom search_terms
(in some
instances). (GitHub: #345, #360; thanks to @sor16)stats
of
summary.vsel()
), the bootstrapping results are now also
used for inferring the lower and upper confidence interval bounds.
(GitHub: #318, #347; thanks to @awd97 and @VisionResearchBlog)datafit
s, offsets are not supported anymore.
(GitHub: #186 (partly), #351)datafit
s
(and other—unlikely—cases where nclusters == S
and
S <= 20
, with S
denoting the number of
draws in the reference model).datafit
s).
(GitHub: #350)validate_search = FALSE
case of
cv_varsel()
(with cv_method = "LOO"
), the PSIS
weights are now calculated based on the reference model (they used to be
calculated based on the submodels which is incorrect). (GitHub:
#325)"mse"
, "rmse"
,
"acc"
(= "pctcorr"
), and "auc"
(i.e., all performance statistics except for "elpd"
and
"mlpd"
).plot.vsel()
and suggest_size()
gain a new
argument thres_elpd
. By default, this argument doesn’t have
any impact, but a non-NA
value can be used for a customized
model size selection rule (see ?suggest_size
for details).
(GitHub: #335)suggest_size()
heuristic).seed
and .seed
are now
allowed to be NA
for not calling set.seed()
internally at all.d_test
of varsel()
is not
considered as an internal feature anymore. This was possible after
fixing a bug for d_test
(see below). (GitHub: #341)<vsel_object>$summaries
and
<vsel_object>$d_test
now corresponds to the order of
the observations in the original dataset if
<vsel_object>
was created by a call to
cv_varsel([...], cv_method = "kfold")
(formerly, in that
case, the observations in those sub-elements were ordered by fold).
Thereby, the order of the observations in those sub-elements now always
corresponds to the order of the observations in the original dataset,
except if <vsel_object>
was created by a call to
varsel([...], d_test = <non-NULL_d_test_object>)
, in
which case the order of the observations in those sub-elements
corresponds to the order of the observations in
<non-NULL_d_test_object>
. (GitHub: #341)search_terms
caused the R
session to crash).validate_search = FALSE
bug described above in
“Major changes”: The PSIS weights are now calculated based on the
reference model (they used to be calculated based on the submodels which
is incorrect). (GitHub: #325)\mbox{}
commands displayed incorrectly in the HTML
help from R version 4.2.0 on. (GitHub: #326)plot.vsel()
now draws the dashed red horizontal line
for the reference model (and—if present—the dotted black horizontal line
for the baseline model) first (i.e., before the submodel-specific
graphical elements), to avoid overplotting.d_test
of varsel()
: Not only
the predictive performance of the reference model needs to be
evaluated on the test data, but also the predictive performance of the
submodels. (GitHub: #341)cv_varsel()
with LOO CV and
validate_search = FALSE
instead of K-fold CV. (GitHub:
#305)search_terms
of
varsel()
and cv_varsel()
. (GitHub: #155,
#308)NULL
)
search_terms
, method = NULL
is internally
changed to method = "forward"
and
method = "L1"
throws a warning. This is done because
search_terms
only takes effect in case of a forward search.
(GitHub: #155, #308)search_terms
. This is necessary to prevent a bug described
below. (GitHub: #308)PIRLS loop resulted in NaN value
errors automatically.
(GitHub: #314)b
of projpred:::bootstrap()
to
B
.search_terms
vector which excluded
the intercept in conjunction with refit_prj = FALSE
(the
latter in project()
, varsel()
, or
cv_varsel()
) led to incorrect submodels being fetched from
the search or to an error while doing so. This has been fixed now by
internally forcing the inclusion of the intercept in
search_terms
. (GitHub: #308)solution_terms
of
project()
to fix a test failure in R versions >=
4.2.cv_varsel()
with nloo < n
where
n
denotes the number of observations. (GitHub: #94, #252,
commit feea39e)validate_search = FALSE
in cv_varsel()
.nclusters
(=
1
) and nclusters_pred
(= 5
) of
varsel()
and cv_varsel()
were set internally
(the user-visible defaults were NULL
). Now,
nclusters
and ndraws_pred
(note the
ndraws_pred
, not nclusters_pred
) have
non-NULL
user-visible defaults of 20
and
400
, respectively. In general, this increases the runtime
of these functions a lot. With respect to cv_varsel()
, the
new vignette (see vignettes) mentions
two ways to quickly obtain some rough preliminary results which in
general should not be used as final results, though: (i)
varsel()
and (ii) cv_varsel()
with
validate_search = FALSE
(which only takes effect for
cv_method = "LOO"
). (GitHub: #291 and several commits
beforehand, in particular bbd0f0a, babe031, 4ef95d3, and ce7d1e0)proj_linpred()
and proj_predict()
,
arguments nterms
, ndraws
, and
seed
have been removed to allow the user to pass them to
project()
. New arguments filter_nterms
,
nresample_clusters
, and .seed
have been
introduced (see the documentation for details). (GitHub: #92, #135)proj_linpred()
, dimensions are not
dropped anymore (i.e., output elements pred
and
lpd
are always S x N matrices now). (GitHub: #143)integrated = TRUE
,
proj_linpred()
now averages the LPD (across the projected
posterior draws) instead of taking the LPD at the averaged linear
predictors. (GitHub: #143)newdata
does not contain the response variable,
proj_linpred()
now returns NULL
for output
element lpd
. (GitHub: #143)stanreg
(from
package rstanarm) with offsets to have these offsets
specified via an offset()
term in the model formula (and
not via argument offset
).NULL
to a
user-visible value (and NULL
is not allowed anymore).data
of get_refmodel.stanreg()
has been removed. (GitHub: #219)div_minimizer
of
init_refmodel()
now always needs to return a
list
of submodels (see the documentation for details).
Correspondingly, the function passed to argument
proj_predfun
of init_refmodel()
can now always
expect a list
as input for argument fits
(see
the documentation for details). (GitHub: #230)proj_predfun
of
init_refmodel()
now always needs to return a matrix (see
the documentation for details). (GitHub: #230)?`projpred-package`
. (GitHub: #235)Student_t()
family is regarded as
experimental. Therefore, a corresponding warning is thrown when creating
the reference model. (GitHub: #233, #252)Gamma()
family is regarded as
experimental. Therefore, a corresponding warning is thrown when creating
the reference model. (GitHub: paul-buerkner/brms#1255, #240, #252)init_refmodel()
in case of
argument dis
being NULL
(the default) was
dangerous for custom reference models with a family
having
a dispersion parameter (in that case, dis
values of
all-zeros were used silently). The new behavior now requires a
non-NULL
argument dis
in that case. (GitHub:
#254)cv_search
has been renamed to
refit_prj
. (GitHub: #154, #265)as.matrix.projection()
has gained a new argument
nm_scheme
which allows to choose the naming scheme for the
column names of the returned matrix. The default ("auto"
)
follows the naming scheme of the reference model fit (and uses the
"rstanarm"
naming scheme if the reference model fit is of
an unknown class). (GitHub: #82, #279)seed
(and .seed
) arguments now have a
default of sample.int(.Machine$integer.max, 1)
instead of
NULL
. Furthermore, the value supplied to these arguments is
now used to generate new seeds internally on-the-fly. In many cases,
this will change results compared to older projpred
versions. Also note that now, the internal seeds are never fixed to a
specific value if seed
(and .seed
) arguments
are set to NULL
. (GitHub: #84, #286)as.matrix.projection()
method now also returns the estimated group-level effects themselves.
(GitHub: #75)as.matrix.projection()
method now returns the variance components (population SD(s) and
population correlation(s)) instead of the empirical SD(s) of the
group-level effects. (GitHub: #74)README
file. (GitHub: #245)nclusters_pred
was removed. (GitHub: commit 5062f2f)project()
: Warn if elements of
solution_terms
are not found in the reference model (and
therefore ignored). (GitHub: #140)get_refmodel.default()
now passes arguments via the
ellipsis (...
) to init_refmodel()
. (GitHub:
#153, commit dd3716e)init_refmodel()
: The default (NULL
) for
argument extract_model_data
has been removed as it wasn’t
meaningful anyway. (GitHub: #219)folds
of init_refmodel()
has been
removed as it was effectively unused. (GitHub: #220)solution_terms()
. This allowed
the introduction of a solution_terms.projection()
method.
(GitHub: #223)predict.refmodel()
now uses a default of
newdata = NULL
. (GitHub: #223)weights
of init_refmodel()
’s
argument proj_predfun
has been removed. (GitHub: #163,
#224)div_minimizer
functions have been unified into a single div_minimizer
which chooses an appropriate submodel fitter based on the formula of the
submodel, not based on that of the reference model. Furthermore, the
automatic handling of errors in the submodel fitters has been improved.
(GitHub: #230)plot.vsel()
. (GitHub: #234,
#270)cvfun
for
stanreg
fits will now always use inner
parallelization in rstanarm::kfold.stanreg()
(i.e., across
chains, not across CV folds), with getOption("mc.cores", 1)
cores. We do so on all systems (not only Windows). (GitHub: #249)fit
of init_refmodel()
’s argument
proj_predfun
was renamed to fits
. This is a
non-breaking change since all calls to proj_predfun
in
projpred have that argument unnamed. However, this
cannot be guaranteed in the future, so we strongly encourage users with
a custom proj_predfun
to rename argument fit
to fits
. (GitHub: #263)init_refmodel()
has gained argument
cvrefbuilder
which may be a custom function for
constructing the K reference models in a K-fold CV. (GitHub: #271)project()
,
varsel()
, and cv_varsel()
to the divergence
minimizer. (GitHub: #278)init_refmodel()
, any contrasts
attributes of the dataset’s columns are silently removed. (GitHub:
#284)NA
s in data supplied to newdata
arguments
now trigger an error. (GitHub: #285)as.matrix.projection()
(causing
incorrect column names for the returned matrix). (GitHub: #72, #73)vsel
object. (GitHub: #79, #80)varsel()
. (GitHub #90)nloo
of
cv_varsel()
. (GitHub: #93)cv_varsel()
, causing an error in case of
!validate_search && cv_method != "LOO"
. (GitHub:
#95)proj_linpred()
to raise an error if
argument newdata
was NULL
. (GitHub: #97)lpd
in
proj_linpred()
(for integrated = TRUE
as well
as for integrated = FALSE
). (GitHub: #105)proj_linpred()
’s calculation of output
element lpd
(for integrated = TRUE
). (GitHub:
#106, #112)proj_linpred()
’s output elements pred
and
lpd
(for integrated = FALSE
): Now, they are
both S x N matrices, with S denoting the number of (possibly clustered)
posterior draws and N denoting the number of observations. (GitHub:
#107, #112)proj_predict()
’s output matrix to
be transposed in case of nrow(newdata) == 1
. (GitHub:
#112)proj_linpred()
. (GitHub: #114)varsel()
/make_formula
to fail with multidimensional interaction terms. (GitHub: #102,
#103)cv_varsel()
for models with a
single predictor. (GitHub: #115)nterms
of
proj_linpred()
and proj_predict()
. (GitHub:
#110)as.matrix.projection()
in case of 1
(clustered) draw after projection. (GitHub: #130)subfit
, make the column names of
as.matrix.projection()
’s output matrix consistent with
other classes of submodels. (GitHub: #132)nterms_max
of
plot.vsel()
if there is just the intercept-only submodel.
(GitHub: #138)search_path
in, e.g.,
varsel()
’s output. (GitHub: #140)unused argument
) when initializing the
K reference models in a K-fold CV with CV fits not of class
brmsfit
or stanreg
. (GitHub: #140)get_refmodel.default()
, remove old defunct arguments
fetch_data
, wobs
, and offset
.
(GitHub: #140)get_refmodel.stanreg()
. (GitHub: #142,
#184)extract_model_data()
’s
argument extract_y
in get_refmodel.default()
.
(GitHub: #153, commit 39fece8)extract_model_data()
in
K-fold CV. (GitHub: #153, commit 4f32195)proj_predfun()
for GLMMs.
(GitHub: #174)proj_predfun()
for
datafit
s. (GitHub: #177)summary.vsel()$selection
for objects
of class vsel
created by varsel()
. (GitHub:
#179)search_terms
are not
consecutive in size. (GitHub: commit 34e24de)cv_varsel()$pct_solution_terms_cv
.
(GitHub: #188, commit e529ec1)glm_elnet()
(the workhorse for L1 search),
causing the grid for lambda to be constructed without taking observation
weights into account. (GitHub: #198; note that the second part of #198
did not have any consequences for users)print.vsel()
causing argument
digits
to be ignored. (GitHub: #222)cv_search
in
varsel()
and cv_varsel()
to be
TRUE
for datafit
s, although it should be
FALSE
in that case. (GitHub: #223)Error: Levels '<...>' of grouping factor '<...>' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.
)
when predicting from submodels which are GLMMs for newdata
containing new levels for grouping factors. (GitHub: #223)predict.refmodel()
: Fix a bug for integer
ynew
. (GitHub: #223)predict.refmodel()
: Fix input checks for
offsetnew
and weightsnew
. (GitHub: #223)extract_model_data()
, the weights
and offsets are now checked if they are of length 0 (and if yes, then
they are set to vectors of ones and zeros, respectively). This is
important for extract_model_data()
functions which return
weights and offsets of length 0 (see, e.g., brms
version
<= 2.16.1). (GitHub: #223)var
(the predictive variances) and
regul
(amount of ridge regularization) to the internal
submodel fitter for GLMs. (GitHub: #230)NA
s,
an appropriate error is now thrown. Previously, the reference model was
created successfully, but this caused opaque errors in downstream code
such as project()
. (GitHub: #274)We have fully rewritten the internals in several ways. Most importantly, we now leverage maximum likelihood estimation to third parties depending on the reference model’s family. This allows a lot of flexibility and extensibility for various models. Functionality wise, the major updates since the last release are:
search_terms
that allows
the user to specify custom unit building blocks of the projections. New
vignette coming up.Better validation of function arguments.
Added print methods for vsel and cvsel objects. Added AUC statistics for binomial family. A few additional minor patches.
Removed the dependency on the rngtools package.
This version contains only a few patches, no new features to the user.
stan_glm(log(y) ~ log(x), ...)
, that is, it did not allow
transformation for y
.refmodel
-objects using the generic
get_refmodel
-function, and all the functions use only this
object. This makes it much easier to use projpred with other reference
models by writing them a new get_refmodel
-function. The
syntax is now changed so that varsel
and
cv_varsel
both return an object that has similar structure
always, and the reference model is stored into this object.plot/summary
. Now it is possible to compare also to the
best submodel found, not only to the reference model.nloo = n
by default in
cv_varsel
. regul=1e-4
now by default in all
functions.cv_search
argument for the main functions
(varsel
,cv_varsel
,project
and the
prediction functions). Now it is possible to make predictions also with
those parameter estimates that were computed during the L1-penalized
search. This change also allows the user to compute the Lasso-solution
by providing the observed data as the ‘reference fit’ for init_refmodel.
An example will be added to the vignette.Until this version, we did not keep record of the changes between different versions. Started to do this from version 0.9.0 onwards.