The `iml`

package can now handle bigger datasets. Earlier
problems with exploding memory have been fixed for
`FeatureEffect`

, `FeatureImp`

and
`Interaction`

. It’s also possible now to compute
`FeatureImp`

and `Interaction`

in parallel. This
document describes how.

First we load some data, fit a random forest and create a Predictor object.

```
set.seed(42)
library("iml")
library("randomForest")
data("Boston", package = "MASS")
<- randomForest(medv ~ ., data = Boston, n.trees = 10)
rf <- Boston[which(names(Boston) != "medv")]
X <- Predictor$new(rf, data = X, y = Boston$medv) predictor
```

Parallelization is supported via the {future} package. All you need
to do is to choose a parallel backend via
`future::plan()`

.

```
library("future")
library("future.callr")
# Creates a PSOCK cluster with 2 cores
plan("callr", workers = 2)
```

Now we can easily compute feature importance in parallel. This means that the computation per feature is distributed among the 2 cores I specified earlier.

```
<- FeatureImp$new(predictor, loss = "mae")
imp library("ggplot2")
plot(imp)
```

That wasn’t very impressive, let’s actually see how much speed up we get by parallelization.

```
::system_time({
benchplan(sequential)
$new(predictor, loss = "mae")
FeatureImp
})#> process real
#> 2.74s 3.3s
::system_time({
benchplan("callr", workers = 2)
$new(predictor, loss = "mae")
FeatureImp
})#> process real
#> 496.67ms 7.99s
```

A little bit of improvement, but not too impressive. Parallelization is more useful in the case where the model uses a lot of features or where the feature importance computation is repeated more often to get more stable results.

```
::system_time({
benchplan(sequential)
$new(predictor, loss = "mae", n.repetitions = 20)
FeatureImp
})#> process real
#> 9.26s 10.66s
::system_time({
benchplan("callr", workers = 2)
$new(predictor, loss = "mae", n.repetitions = 20)
FeatureImp
})#> process real
#> 557.79ms 9.41s
```

Here the parallel computation is twice as fast as the sequential computation of the feature importance.

The parallelization also speeds up the computation of the interaction statistics:

```
::system_time({
benchplan(sequential)
$new(predictor)
Interaction
})#> process real
#> 20s 21.9s
::system_time({
benchplan("callr", workers = 2)
$new(predictor)
Interaction
})#> process real
#> 680.3ms 15.7s
```

Same for `FeatureEffects`

:

```
::system_time({
benchplan(sequential)
$new(predictor)
FeatureEffects
})#> process real
#> 1.03s 1.23s
::system_time({
benchplan("callr", workers = 2)
$new(predictor)
FeatureEffects
})#> process real
#> 1.1s 6.69s
```