RNN Assignment-1 Week 1

I have 3 questions.

  1. During Back propagation of RNN dx_t is being computed and also it gets updated in dx , But isnt x an input data from dataset why are we computing dx, because we are already doing back prop for Wax which is a weight matrix for x and a.During Back propagation of RNN dx_t is being computed and also it gets updated in dx , But isnt x an input data from dataset why are we computing dx, because we are already doing back prop for Wax which is a weight matrix for x and a.

  2. During BackProp in the following line of code.
    gradients = rnn_cell_backward(da[:,:,t] + da_prevt, caches[t])
    why are we adding da[:,:,t] with da_prevt.

  3. Also During RNN we use the same parameter weights at every time step right if I’m not wrong? Then why are we storing the parameters in cache and append it to caches ultimately all the values of parameters in caches is same right? why don’t we simply just send it once instead of storing it for every timestep

2 Likes

Do you still need help with this issue?

For your point 3), note that it is true that there is only one set of weights, which are used at every time step. But the input data at every time step is different, which results in different gradients produced at every time step on every iteration. All those different gradients get applied to the same shared weights, but the point is that the effect produced by each timestep on back prop is different. That’s why we need to keep track of the results of each timestep separately.