Introduction to Random Forests
(via bootstrap aggregation, in ten minutes or less)
Classification and Regression Trees - Recap
- Partition predictor space and predict modal/mean outcome in each subregion.
- Sequential, conditional “cuts” perpendicular to predictor axes
- Continue splitting until stopping criteria met
- e.g. change in residual deviance is small.
Classification and Regression Trees - Recap
- Easily interpretable model
- bias-variance trade-off
- decision boundaries bisect (continuous) data points, so are very sensitive to individual data point values.
- This is a low-bias high-variance model but we want one that generalises well.
Solution 1: Bagging
If \(Y_1, \ldots,Y_n\) are independent RVs with variance \(\sigma^2\) then
\[ \text{Var}(\bar Y) = \text{Var}\left(\frac{1}{n}\sum_{i=1}^{n} Y_i\right) = \frac{\sigma^2}{n}.\] This suggests that we could reduce the sensitivity of our tree-based predictions to the particular values in our dataset by:
- Taking many training sets
- Fitting a decision tree to each
- Averaging the predictions made by each tree.
This is the first example of an ensemble model. Common in weather/hazard modelling.
Bagging
Issue: we only have one set of training data.
Bootstrap Aggregation
- Re-sample with replacement our original data \(x\), \(B\) times: \(\{x^1,\ldots,x^2,x^B\}\).
- Fit tree \(\hat f_b(x)\) using \(x^b\) for \(b\) in \(1,\ldots,B\).
- Take our ensemble prediction as the mean (mode) of all regression (classification) tree predictions.
\[ \hat f(x) = \frac{1}{B}\sum_{i=1}^{n}\hat f_b(x).\]
Claim: This has a stabilising effect, reducing sensitivity to the particular data values we observe.
Bagging - Example (Bootstrap 1)
Bagging - Example (Bootstrap 1)
Bagging - Example (Bootstrap 1)
Fit a classification tree \(\hat f_1(x)\) to the first bootstrapped data set