Introduction to Random Forests

(via bootstrap aggregation, in ten minutes or less)

Zak Varty

Classification and Regression Trees - Recap

  • Partition predictor space and predict modal/mean outcome in each subregion.
  • Sequential, conditional “cuts” perpendicular to predictor axes
  • Continue splitting until stopping criteria met
    • e.g. change in residual deviance is small.

Classification and Regression Trees - Recap

  • Easily interpretable model
  • bias-variance trade-off
    • decision boundaries bisect (continuous) data points, so are very sensitive to individual data point values.
    • This is a low-bias high-variance model but we want one that generalises well.

Solution 1: Bagging

If \(Y_1, \ldots,Y_n\) are independent RVs with variance \(\sigma^2\) then

\[ \text{Var}(\bar Y) = \text{Var}\left(\frac{1}{n}\sum_{i=1}^{n} Y_i\right) = \frac{\sigma^2}{n}.\] This suggests that we could reduce the sensitivity of our tree-based predictions to the particular values in our dataset by:

  • Taking many training sets
  • Fitting a decision tree to each
  • Averaging the predictions made by each tree.

This is the first example of an ensemble model. Common in weather/hazard modelling.

Bagging

Issue: we only have one set of training data.

Bootstrap Aggregation

  1. Re-sample with replacement our original data \(x\), \(B\) times: \(\{x^1,\ldots,x^2,x^B\}\).
  2. Fit tree \(\hat f_b(x)\) using \(x^b\) for \(b\) in \(1,\ldots,B\).
  3. Take our ensemble prediction as the mean (mode) of all regression (classification) tree predictions.

\[ \hat f(x) = \frac{1}{B}\sum_{i=1}^{n}\hat f_b(x).\]

Claim: This has a stabilising effect, reducing sensitivity to the particular data values we observe.

Bagging - Example (Bootstrap 1)

Bagging - Example (Bootstrap 1)

Bagging - Example (Bootstrap 1)

Fit a classification tree \(\hat f_1(x)\) to the first bootstrapped data set