I have 3 questions.
-
During Back propagation of RNN dx_t is being computed and also it gets updated in dx , But isnt x an input data from dataset why are we computing dx, because we are already doing back prop for Wax which is a weight matrix for x and a.During Back propagation of RNN dx_t is being computed and also it gets updated in dx , But isnt x an input data from dataset why are we computing dx, because we are already doing back prop for Wax which is a weight matrix for x and a.
-
During BackProp in the following line of code.
gradients = rnn_cell_backward(da[:,:,t] + da_prevt, caches[t])
why are we adding da[:,:,t] with da_prevt. -
Also During RNN we use the same parameter weights at every time step right if I’m not wrong? Then why are we storing the parameters in cache and append it to caches ultimately all the values of parameters in caches is same right? why don’t we simply just send it once instead of storing it for every timestep