Logistic Regression


statistical model that models the probability of an event taking place by having the log-odds of an event be a linear combination of one or more independent variables. since logistic regression fit a model in the form $p(y|x)$ directly, it’s called discriminative approach.

$$ p(y|x,w) = Ber(y|sigm(w^Tx))$$ where sigm is a sigmoid function: $$sigm(x) = \frac{1}{1 + e^{-x}}$$

model fitting

MLE

the negative log-likelihood for logistic regression is given by $$NLL(w) = -\sum_{i=1}^N y_i log(\mu_i) + (1-y_i)log(1-\mu_i)$$ this is also called **cross entropy** error function.

we can introduce a bias term if we want to penalize the w and make it smaller, smaller w will give as simpler hypothesis and less prone to overfitting (this is called regularization)

example of NLL with L2 Regularization $$NLL(w) = -\sum_{i=1}^N y_i log(\mu_i) + (1-y_i)log(1-\mu_i) + \lambda (w^Tw)$$

References

https://en.wikipedia.org/wiki/Logistic_regression