Decision Tree

March 25, 2023 2-minute read

regression • classification • machine-learning

Greedy, Top Down, Recursive Partitioning

CART Algorithm

CART constructs binary trees using the feature and threshold that yield the largest information gain (usually it based on entropy) at each node.

get list of split candidates
for each split candidates, calculate the purity of the split
select the best split, and check if stopping criterion is met
repeat step 1

Classification

if a target is a classification with k number of classes, for node m, let prediction: $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} I(y=k)$$ usually, we use Gini Impurity to meassure the purity of each node or split candidates: $$IG(Node) = 1-\sum_{i=0}^k p(i)^2$$ or for binary classification = (1 - probability of yes squared + probability of no squared)

if all the data in a single node has the same class, the Gini Impurity is 0

Regression

if a target is a regression, for node m, let prediction (average of all the target values in the node m): $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} y$$ usually, we use RMSE to calculate the purity of each node or split candidates:

Note

for split candidates usually we use weighted average of the purity meassure to decide the best split for training Decision Tree, using number of data on each child as the weight

for example, if we split a node to 3 data in the left node, 2 data in the right node

the split purity is:

3/5 left node purity + 2/5 right node purity

References

Statquest: https://youtube.com/playlist?list=PLBq2sVJiEBvA9rPo3IEQsJNI4IJbn81tB
Sklearn: https://scikit-learn.org/stable/modules/tree.html
Wikipedia: https://en.wikipedia.org/wiki/Decision_tree_learning
CS229: https://www.youtube.com/watch?v=wr9gUr-eWdA