Lstm
LSTM
Long Short-Term Memory networks
→ Type of recurrent-neural-network that solves vanishing gradient problem → Used in NLP for sequential data processing → Contains memory cells with gates (forget, input, output)
Solves vanishing-gradient and exploding-gradient in rnn
Uses gates to control information flow:
- Forget gate
- Input gate
- Output gate
Before transformer architectures dominated NLP
*References
- Understanding LSTM Networks
- LSTM Original Paper - Hochreiter & Schmidhuber
- gru
- attention-is-all-you-need
- Understanding LSTM Networks
- LSTM in TensorFlow
#ml-notes