Tree-Based Imputation

April 9, 2023 One-minute read

machine-learning • data-imputation

It involves building a decision tree model using the non-missing data and then using the tree model to predict the missing values. Here are some pros and cons of tree-based imputation:

Pros:

It can handle non-linear relationships between the variables and non-monotonic missing data patterns.
It can impute both continuous and categorical variables.
It can be less prone to bias compared to other imputation methods.

Cons:

It can be computationally intensive, especially when dealing with large datasets.
It can overfit the data and lead to inaccurate imputations if the decision tree model is not properly tuned.
It may not perform well when there is high missingness.

When to use: Tree-based imputation can be a useful method when there are non-linear relationships between the variables and non-monotonic missing data patterns. It can be particularly useful when dealing with complex datasets where other imputation methods may not perform well. However, it should be used with caution, and the decision tree model should be properly tuned and validated to avoid overfitting and inaccurate imputations.

References