Additional resources back-propagation in sequence models

Hi all,
I want to understand the mechanisms of back-propagation in RNN and LSTM. I was working on the week 1’s 1st assignment’s optional section, but I wish to delve more into theoretical derivations for back-prop.
Specifically, I want to know

  1. How da_next is computed.
  2. Why is there the following extra term present in back-prop derivative equation for LSTM forget and input gates:
    𝑑𝑐𝑛𝑒π‘₯π‘‘βˆ—π‘π‘π‘Ÿπ‘’π‘£ (In reference to the notebook formulae)
    This is the LSTM equation:

Sorry, I’m not able to reply about how the backpropagation works. You might need to search the internet for answers on this.

Hi, I am looking on this too. Do you have answer for this yet?

Hello @Krishna_Prasanna and @jongchayong,

I haven’t had time to write about LSTMs yet, but maybe my derivation of fully-connected layers can inspire you to attempt the derivation by yourselves.