Active learning is a method of intelligently sampling data to select the most informative unlabeled points for labeling. It helps in scenarios with limited data budget, imbalanced datasets, and when standard sampling techniques do not improve accuracy.
Active learning involves iteratively selecting the most uncertain data points for labeling to improve the model’s performance.
Intelligent sampling techniques in active learning:
- Margin sampling: Labels the most uncertain points based on their distance from the decision boundary.
- Cluster-based sampling: Selects diverse points using clustering methods over the feature space.
- Query-by-committee: Trains multiple models and selects points with the highest disagreement among them.
- Region-based sampling: Divides the input space into regions and runs active learning algorithms in each region.
- etc.