Skip to content

Vanishing Gradient

vanishing-gradient

Gradient values become too small during backpropagation → network stops learning

Common in deep rnn → weights shrink exponentially

Solutions:

  • lstm gates
  • gru architecture
  • ReLU activation
  • Residual connections

*References


*References

#ml-notes

On this page