If you are distributing binary R packages (or any other binary) for Linux, it is important that you check and declare the run-time dependencies for your binaries. This can easily be automated, and prevents many problems and conflicts. Currently RSPM leaves the client guessing which system libraries the binaries are linked to, which results in users installing unnecessary build-time dependencies, sometimes even the wrong ones.
Once you distinguish between build-time and run-time system libraries in Linux distributions, the solution is obvious, and the system will become much simpler and more robust.
This is not a hack, Linux package managers have been designed to automatically
determine dependencies between system libraries. You should use the
same tools when providing binaries for R packages, even if they are not
distributed in a
In a nutshell: After you have successfully built an R package on your
Linux server, run
ldd on the package
to list the shared libraries it links to. The operating system package
dpkg) can tell you which
system package each file belongs to. Simply add this information to the
binary package DESCRIPTION file that you are shipping. That’s it!
To make it even easier: the
maketools package has an
example function that shows the system dependencies for installed R
packages on Linux. For example, let’s have a look at the dependencies of
sf CRAN package. On Ubuntu 20.04 we see:
> maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.1 libproj15 libproj-dev proj 6.3.1-1 2 libgdal.so.26.0.4 libgdal26 libgdal-dev gdal 3.0.4+dfsg-1build3 3 libgeos_c.so.1.13.1 libgeos-c1v5 libgeos-dev geos 3.8.0-1build1 4 libstdc++.so.6.0.28 libstdc++6 <NA> gcc 10-20200411-0ubuntu1
And on Fedora 32 we get:
> maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.2 proj proj-devel proj 6.3.2 2 libgdal.so.26.0.4 gdal-libs gdal-devel gdal 3.0.4 3 libgeos_c.so.1.13.3 geos geos-devel geos 3.8.1 4 libstdc++.so.6.0.28 libstdc++ <NA> gcc 10.2.1
The first column
shlib tells you which shared libraries
the R package is linked to, i.e. the filenames of the
files. The second column shows which system package this file belongs
to. This is the (only) relevant piece of information when you are
distributing the binary, because these are exactly the system packages
the client needs to have installed for the binary R package to work.
Nothing more, nothing less!
A simple way to build R binary packages is on a server or container
that has all build-time libraries pre-installed (the per-package
build-time dependencies are really not relevant). For example you can
use the cranlike
docker images for the latest version of Debian and Ubuntu.
docker run -it cran/ubuntu
After building and installing an R package, you check the package run-time dependencies, for example:
> install.packages("openssl") ## ... ## ... ## ** checking absolute paths in shared objects and dynamic libraries ## ** testing if installed package can be loaded from final location ## ** testing if installed package keeps a record of temporary installation path ## * DONE (openssl) > maketools::package_sysdeps("openssl") shlib package headers source version 1 libssl.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2 2 libcrypto.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2
For every R binary package you distribute, you should provide, at a
minimum, the information from the
package column. The best
way would be to add this to the DESCRIPTION file of the binary R
package, and ideally also expose this in the PACKAGES
repository index. Thereby clients can lookup the required system
dependencies needed for this binary R package, 100% reliably, without
guessing or conflicts.