Hi all,
I want to understand the mechanisms of back-propagation in RNN and LSTM. I was working on the week 1βs 1st assignmentβs optional section, but I wish to delve more into theoretical derivations for back-prop.
Specifically, I want to know
- How da_next is computed.
- Why is there the following extra term present in back-prop derivative equation for LSTM forget and input gates:
πππππ₯π‘βπππππ£ (In reference to the notebook formulae)
This is the LSTM equation: