Sensitivity analysis helps understand a model’s behavior by examining the impact of each feature on predictions.
It involves changing a single feature while keeping others constant and observing the resulting model outputs. The magnitude of change in predictions indicates the feature’s influence.
Techniques for sensitivity analysis include:
- Random attacks: Generating random input data to test model outputs and uncover unexpected bugs.
- Partial dependence plots: Show marginal effect of 1 or 2 feature and it’s effect to the result of the model
Adversarial Attacks and Vulnerability
Adversarial attacks can fool ML models into misclassifying data by making small, targeted changes..
Security and privacy harms from machine learning can be informational or behavioral.
- Informational harms: Leakage of Information include membership inference (is an individual used for training?), model inversion (recreate the training data), and model extraction attacks.
- Behavioral harms: Manipulating the behaviour of the model include model poisoning (insert malicious data into training data) and evasion attacks (input data that causes the model to intentionally misclassify that data).
Hardening Models against Adversarial Attacks
- Adversarial training, using tools like CleverHans, can harden models by incorporating adversarial images in training data
- Defensive distillation It reduces vulnerability to attacks by training a model with the same network architecture, using Knowledge Distillation to improve resilience.
Tools related to Adversarial attack:
- cleverhans: for benchmarking model vulnerability to adversarial examples.
- foolbox: for running adversarial attacks against machine learning models.