Building_a_Recurrent_Neural_Network_Step_by_Step

maxma · January 3, 2024, 12:19pm

I have a question
when doing rnn_backward(da, caches) function
da is given,
while gradients = rnn_cell_backward(da[:,:,t], caches[t])
dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[“da_prev”], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]
da_prevt is calculated by rnn_cell_backward function
why da_prevt != da[:,:,t-1] ?
i am so confused

paulinpaloalto · January 3, 2024, 7:42pm

I’m not sure I understand your question, but my take is that the point is that backward propagation is going backwards: we start from the cost and then propagate the gradients in the opposite direction from forward propagation. In an RNN, things are a bit more complicated because at each timestep, we are project “forwards” in two directions: towards the \hat{y}^{<t>} of the current timestep (at least in the case of a “many to many” RNN) and also to the updated hidden state a^{<t>} that will be input to the next timestep. So when we go backwards, we get gradients from both of those directions.

But maybe you’re saying that da[:,:,t-1] is basically the same as what they mean by da_prevt in this formulation. I think that’s correct. In other words, what they are deriving is how to compute da[:,:t-1], which is used to compute gradients for the various weight matrices as well.

All this is pretty complicated, so please let me know if I’m missing your real point here.

maxma · January 4, 2024, 8:20am

hi paul,
in the programming exercise,
my [def rnn_cell_backward(da_next, cache): return gradients] function passed the tests.
But in the next function [def rnn_backward(da, caches): return gradients] can not pass the tests.
my code is like
for t in reversed(range(T_x)):
# Compute gradients at time step t. Choose wisely the “da_next” and the “cache” to use in the backward propagation step. (≈1 line)
print('t is: ', t, ‘, da_prevt==da[:,:,t]’, da_prevt==da[:,:,t])
gradients = rnn_cell_backward(da[:,:,t], caches[t])
# Retrieve derivatives from gradients (≈ 1 line)
dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[‘da_prev’], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]

why in each iteration, da_prevt != da[:,:,t] ?

maxma · January 4, 2024, 8:36am

it says Arguments:
da – Upstream gradients of all hidden states, of shape (n_a, m, T_x)
caches – tuple containing information from the forward pass (rnn_forward)
does it mean da comes from y stream and hidden state a stream?
can I should convert da[:,:,t] to da which only related to hidden state a stream?

paulinpaloalto · January 4, 2024, 4:44pm

Sorry, I hadn’t looked at the backprop section in a while. It turns out they leave out the \hat{y} part of the back prop according to this comment in the instruction:

Note : rnn_cell_backward does not include the calculation of loss from 𝑦⟨𝑡⟩�⟨�⟩. This is incorporated into the incoming da_next . This is a slight mismatch with rnn_cell_forward , which includes a dense layer and softmax.

So they’re really sort of simplifying things a bit here, since we’re not really going to use this code. When we actually want to train a model, we’ll use TF and that handles all the backprop for us magically. This is just to give us some intuition about how things work.

What they mean by the “choose wisely” comment is that you need more than just da[:,:,t] to form da_next. It also includes the da_prevt value.

The point of da_prevt is that it was da_next from the next timestep, right? See the diagrams in the instructions. And you can see how that is computed in rnn_cell_backward. Notice in rnn_backward, they start by initializing da_prevt to all zeros, because we’re starting backprop at the last timestep, so there is no “next” in that case. Then we get it as an output from rnn_cell_backward at each subsequent (well, previous really) timestep.

Topic		Replies	Views
C5W1A1Exercise3.1 Sequence Models	5	753	March 5, 2024
Week 1 Assignment 1 Backpropagation Sequence Models	19	2727	July 20, 2024
Need help with DL specialization Course 5 Lab1 rnn_backward function Sequence Models	4	422	August 6, 2023
C5, W1A1 optional RNN back propagation Sequence Models	10	961	January 2, 2024
Help Understanding Rnn Backprop Exersize Week 1 Assignment 1 Sequence Models	4	409	August 12, 2023

Building_a_Recurrent_Neural_Network_Step_by_Step

Related topics