Decision Tree


Greedy, Top Down, Recursive Partitioning

CART Algorithm

CART constructs binary trees using the feature and threshold that yield the largest information gain (usually it based on entropy) at each node.

  1. get list of split candidates
  2. for each split candidates, calculate the purity of the split
  3. select the best split, and check if stopping criterion is met
  4. repeat step 1

Classification

if a target is a classification with k number of classes, for node m, let prediction: $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} I(y=k)$$ usually, we use Gini Impurity to meassure the purity of each node or split candidates: $$IG(Node) = 1-\sum_{i=0}^k p(i)^2$$ or for binary classification = (1 - probability of yes squared + probability of no squared)

if all the data in a single node has the same class, the Gini Impurity is 0

Regression

if a target is a regression, for node m, let prediction (average of all the target values in the node m): $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} y$$ usually, we use **RMSE** to calculate the purity of each node or split candidates:

Note

for split candidates usually we use weighted average of the purity meassure to decide the best split for training Decision Tree, using number of data on each child as the weight

for example, if we split a node to 3 data in the left node, 2 data in the right node

the split purity is:

3/5 left node purity + 2/5 right node purity


References

  1. Statquest: https://youtube.com/playlist?list=PLBq2sVJiEBvA9rPo3IEQsJNI4IJbn81tB
  2. Sklearn: https://scikit-learn.org/stable/modules/tree.html
  3. Wikipedia: https://en.wikipedia.org/wiki/Decision_tree_learning
  4. CS229: https://www.youtube.com/watch?v=wr9gUr-eWdA