Gradient Boosting


Algorithm

  1. Initialize model with a constant value $F_0$ $$F_0(x) = arg\ min\ \gamma \sum_{i}^n L(y_i, \gamma)$$ *initiate the base model (for regression usually $F_0$ is $mean(y)$
  2. For $m$ to $M$:
    1. compute pseudo-residuals: $$r_{im} = \Biggl[ \frac{\partial L(y_i, F(x_i)) }{\partial F(x_i)} \Biggr]_{F(x) = F_{m-1}(x)}$$ for $i = 1, . . ., n.$
    2. Fit a weak learner to pseudo-residuals, i.e. train it using $r_{im}$ as the target. *usually it’s a decision tree or boosted tree for **gradient boosted decisiontree**)*
    3. compute $\gamma_{m}$ by solving $$\gamma_{m} = arg\ min\ \gamma \sum_{i}^n L(y_i, F_{m-1}(x_i)+ \gamma)$$ *basically get the most optimal prediction for each possible outcome (usually mean(y) for regression)*
    4. update $F_m(x) = F_{m-1}(x) + \epsilon\ \gamma_{m}$ *$\epsilon$ is learning rate*

References

  1. https://en.wikipedia.org/wiki/Gradient_boosting
  2. https://explained.ai/gradient-boosting/
  3. https://youtu.be/2xudPOBz-vs