Recurrent Neural Net


Vanilla RNN:

$$h_t = F(h_{t-1}, x_t)$$ $$h_t = activation(W_h\ h_{t-1} + W_x\ x_t)$$ $$y_t = W_y \ h_t$$

img image source: https://youtu.be/6niqTuYFZLQ

Problem:

  1. hard to remember long sentences
  2. Vanishing and Exploding Gradient problem

LSTM:

img image from https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Forget gate:

img

since i’s using sigmoid function the output would be 0-1 (decide how much we should forget the long term memory, if 0 forget)

Input gate:

img

tanh: used to update $C_{t-1}$ from $C_t$ sigmoid: used to decide on how much we should update $C_{t-1}$ from $C_t$

Output gate:

img


References

  1. https://youtu.be/6niqTuYFZLQ
  2. https://colah.github.io/posts/2015-08-Understanding-LSTMs/