Greedy, Top Down, Recursive Partitioning
CART Algorithm
CART constructs binary trees using the feature and threshold that yield the largest information gain (usually it based on entropy) at each node.
- get list of split candidates
- for each split candidates, calculate the purity of the split
- select the best split, and check if stopping criterion is met
- repeat step 1
Classification
if a target is a classification with k number of classes, for node m, let prediction: $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} I(y=k)$$ usually, we use Gini Impurity to meassure the purity of each node or split candidates: $$IG(Node) = 1-\sum_{i=0}^k p(i)^2$$ or for binary classification = (1 - probability of yes squared + probability of no squared)
if all the data in a single node has the same class, the Gini Impurity is 0
Regression
if a target is a regression, for node m, let prediction (average of all the target values in the node m): $$p_{mk} = \frac{1}{n_m} \sum_{y\in Node_m} y$$ usually, we use **RMSE** to calculate the purity of each node or split candidates:
Note
for split candidates usually we use weighted average of the purity meassure to decide the best split for training Decision Tree, using number of data on each child as the weight
for example, if we split a node to 3 data in the left node, 2 data in the right node
the split purity is:
3/5 left node purity + 2/5 right node purity