I have a question

when doing rnn_backward(da, caches) function

da is given,

while gradients = rnn_cell_backward(da[:,:,t], caches[t])

dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[“da_prev”], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]

da_prevt is calculated by rnn_cell_backward function

why da_prevt != da[:,:,t-1] ?

i am so confused

I’m not sure I understand your question, but my take is that the point is that backward propagation is going backwards: we start from the cost and then propagate the gradients in the opposite direction from forward propagation. In an RNN, things are a bit more complicated because at each timestep, we are project “forwards” in two directions: towards the \hat{y}^{<t>} of the current timestep (at least in the case of a “many to many” RNN) and also to the updated hidden state a^{<t>} that will be input to the next timestep. So when we go backwards, we get gradients from both of those directions.

But maybe you’re saying that da[:,:,t-1] is basically the same as what they mean by `da_prevt`

in this formulation. I think that’s correct. In other words, what they are deriving is how to compute da[:,:t-1], which is used to compute gradients for the various weight matrices as well.

All this is pretty complicated, so please let me know if I’m missing your real point here.

hi paul,

in the programming exercise,

my [def rnn_cell_backward(da_next, cache): return gradients] function passed the tests.

But in the next function [def rnn_backward(da, caches): return gradients] can not pass the tests.

my code is like

for t in reversed(range(T_x)):

# Compute gradients at time step t. Choose wisely the “da_next” and the “cache” to use in the backward propagation step. (≈1 line)

print('t is: ', t, ‘, da_prevt==da[:,:,t]’, da_prevt==da[:,:,t])

gradients = rnn_cell_backward(da[:,:,t], caches[t])

# Retrieve derivatives from gradients (≈ 1 line)

dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[‘da_prev’], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]

why in each iteration, da_prevt != da[:,:,t] ?

it says Arguments:

da – Upstream gradients of all hidden states, of shape (n_a, m, T_x)

caches – tuple containing information from the forward pass (rnn_forward)

does it mean da comes from y stream and hidden state a stream?

can I should convert da[:,:,t] to da which only related to hidden state a stream?

Sorry, I hadn’t looked at the backprop section in a while. It turns out they leave out the \hat{y} part of the back prop according to this comment in the instruction:

Note:`rnn_cell_backward`

doesnotinclude the calculation of loss from 𝑦⟨𝑡⟩�⟨�⟩. This is incorporated into the incoming`da_next`

. This is a slight mismatch with`rnn_cell_forward`

, which includes a dense layer and softmax.

So they’re really sort of simplifying things a bit here, since we’re not really going to use this code. When we actually want to train a model, we’ll use TF and that handles all the backprop for us magically. This is just to give us some intuition about how things work.

What they mean by the “choose wisely” comment is that you need more than just da[:,:,t] to form `da_next`

. It also includes the `da_prevt`

value.

The point of `da_prevt`

is that it was `da_next`

from the next timestep, right? See the diagrams in the instructions. And you can see how that is computed in `rnn_cell_backward`

. Notice in `rnn_backward`

, they start by initializing `da_prevt`

to all zeros, because we’re starting backprop at the *last* timestep, so there is no “next” in that case. Then we get it as an output from `rnn_cell_backward`

at each subsequent (well, previous really) timestep.