Why do we need to use (da[:,:,t]+da_prevt) instead of only (da_prevt) in the for loop in the rnn_backward block? I just couldn’t get the ituition of why “adding gradients” would work there.
I also have the same question. Can anyone help with this?
Hi.
da_prevt comes from later RNN cells.
da[:,:,t] comes from softmax and dense layers related to y and it is calculated elsewhere and just passed to our function.
I hope it makes sense.
1 Like
thanks it makes sense to me now