Derivation of Backpropagation in RNNs

Can anybody helps me in derivation of those variables in back propagation -

Hey there @mahesh-mantri ,

I assume that you want explanation for each one of them:

  • dtanh → Derivative of tanh activation function w.r.t. the input to the activation function.

  • dW_{ax} → Gradient of the loss w.r.t. the weight matrix connecting the input to the current hidden state.

  • dW_{aa} → Gradient of the loss w.r.t. the weight matrix connecting the previous hidden state to the current hidden state.

  • db_a → The gradient of the loss w.r.t. the bias term in the hidden state update.

  • dx^{(t)} → The gradient of the loss w.r.t. the input at time step t.

  • da_{prev} → Gradient of the loss w.r.t. the previous hidden state.

Feel free to ask if you need further assistance!

Hello @Alireza_Saei , I was asking for how we can derive these formulae , want to understand the math behind them.

These are some basic derivatives. You can find the formulas almost everywhere! Try to understand them yourself, but if you ever feel stuck, feel free to ask, and I can explain them!

This requires first that you understand calculus. If you do, then the derivatives are straightforward.