Feature Engineering


Preprocessing

  1. Data Cleansing: To eliminate or correct erroneous data by identifying and handling inconsistencies or outliers.

  2. Scaling and Normalizing: Numerical features often need to be scaled or normalized to ensure the models can learn effectively. Scaling adjusts the range of features, while normalization brings them within a specific range, such as between 0 and 1.

  3. dimensionality Reduction: Reducing the number of features by creating a lower-dimensional representation of the data. It helps to remove noise, improve model performance, and interpretability.

  4. Feature Construction: Feature construction involves creating new features from existing ones using various techniques.

Feature Engineering Techniques

Some commonly used techniques:

  1. Feature Scaling: Scaling numerical features ensures they are within a similar range, making it easier for models to learn from them.

  2. Bucketizing/Binning: Grouping numerical data into buckets or bins can convert continuous values into discrete categories. This technique is useful when encoding numerical data as categories, enabling models to learn patterns based on these groups.

  3. dimensionality Reduction: Techniques like Principal Component Analysis (PCA), t-SNE, and UMAP help reduce the number of features while preserving important information.

  4. Feature Crosses: Combining multiple features to create new features. This technique allows for the encoding of non-linear relationships or the expression of the same information in fewer features. Feature.

  5. Encoding Features: Encoding features involves transforming categorical variables into numerical representations. Techniques like one-hot encoding and categorical embeddings are used to convert categorical data into a format suitable for models.

Feature Crosses

Guiding Principles:

  1. expert-selected interactions should be the first to be explored.
  2. The interaction hierarchy principle states that the higher degree of the interaction, the less likely the interaction will explain variation in the response. img source: https://bookdown.org/max/FES/interactions-guiding-principles.html The three-way interaction term is greyed out due to the effect sparsity principle.
  3. effect sparsity, contends that only a fraction of the possible effects truly explain a significant amount of response variation.
  4. heredity principle is based on principles of genetic heredity and asserts that interaction terms may only be considered if the ordered terms preceding the interaction are effective at explaining response variation.

Key points:

  • Feature crosses combine multiple features together to create a new feature that captures the relationship between them.
  • They can be used to encode non-linear patterns in the feature space, allowing models to capture more complex interactions.
  • Feature crosses can also reduce the number of features by encoding the same information in a single feature.
  • They can be powerful in capturing interactions and improving the predictive capabilities of models.

Example:

  1. Area = Width x Height
  2. Hour of week = combination of “hour of day” and “day of week”

References

  1. https://www.coursera.org/learn/machine-learning-data-lifecycle-in-production/home/week/2
  2. https://bookdown.org/max/FES/interactions-guiding-principles.html