Penalised regression with multiple sources of prior effects


Armin Rauschenberger$~^{1,a}$ AR

Zied Landoulsi$~^{1}$ ZL

Mark A. van de Wiel$~^{2,b}$ MvdW

Enrico Glaab$~^{1,b}$ EG

\(^1\)Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg.

\(^2\)Department of Epidemiology and Data Science (EDS), Amsterdam University Medical Centers (Amsterdam UMC), Amsterdam, The Netherlands.

\(^{a}\)To whom correspondence should be addressed.

\(^{b}\)Mark A. van de Wiel and Enrico Glaab share senior authorship.


In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package `transreg’ (https://github.com/lcsb-bds/transreg).

Full article (open access)

Bioinformatics (In press): doi: 10.1093/bioinformatics/btad680

arXiv: 2212.08581 (pdf)

ORBi: 10993/53227 (pdf)