C5, W1A1 optional RNN back propagation

yipan9305 · October 10, 2021, 9:31pm

Need help on the function: rnn_backward(da, caches):
I have initialized the gradients as follows but got incorrect results.
# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m, T_x))
da_prevt = np.zeros((n_a, m, T_x))

Loop through all the time steps

for t in reversed(range(T_x)):
    # Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
    gradients = rnn_cell_backward(da[:, :, t], caches[t])

# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line) 
da0 = da_prevt

I got:
gradients[“dx”][1][2] = [-0.15028183 -0.34554547 0.02071758 0.01483317]
as compared to the expected values:
gradients[“dx”][1][2] = [-2.07101689 -0.59255627 0.02466855 0.01483317]

Note: the shapes are all correct though.

Thanks!

yipan9305 · October 14, 2021, 6:36pm

Can anybody help please?

jonaslalin · October 14, 2021, 8:43pm

You are missing the addition:

yipan9305 · October 15, 2021, 1:49am

Thank you for your time, Jonaslalin. I am assuming it is allowed to post my codes other than the graded portion. Here is my completed coding of the rnn_backward function. Note: the rnn_cell_backward function was checked and the outputs were the same as the expected ones.

UNGRADED FUNCTION: rnn_backward

def rnn_backward(da, caches):
“”"
Implement the backward pass for a RNN over an entire sequence of input data.

Arguments:
da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)
caches -- tuple containing information from the forward pass (rnn_forward)

Returns:
gradients -- python dictionary containing:
                    dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)
                    da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)
                    dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)
                    dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)
                    dba -- Gradient w.r.t the bias, of shape (n_a, 1)
"""
    
### START CODE HERE ###

# Retrieve values from the first cache (t=1) of caches (≈2 lines)
(caches, x) = caches
(a1, a0, x1, parameters) = caches[0]

# Retrieve dimensions from da's and x1's shapes (≈2 lines)
n_a, m, T_x = da.shape
n_x, m = x1.shape 

# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m))
da_prevt = np.zeros((n_a, m))

# Loop through all the time steps
for t in reversed(range(T_x)):
    # Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
    gradients = rnn_cell_backward(da[:, :, t], caches[t])
    # Retrieve derivatives from gradients (≈ 1 line)
    dxt, da_prevt, dWaxt, dWaat, dbat = gradients["dxt"], gradients["da_prev"], gradients["dWax"], gradients["dWaa"], gradients["dba"]
    # Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
    dx[:, :, t] = dxt
    dWax += dWaxt
    dWaa += dWaat 
    dba += dbat
    
# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line) 
da0 = da_prevt
### END CODE HERE ###

# Store the gradients in a python dictionary
gradients = {"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba}

return gradients

outputs:
gradients[“dx”][1][2] = [-0.15028183 -0.34554547 0.02071758 0.01483317]
gradients[“dx”].shape = (3, 10, 4)
gradients[“da0”][2][3] = -0.17268893183890754
gradients[“da0”].shape = (5, 10)
gradients[“dWax”][3][1] = 4.081485734449453
gradients[“dWax”].shape = (5, 3)
gradients[“dWaa”][1][2] = 1.056012342849445
gradients[“dWaa”].shape = (5, 5)
gradients[“dba”][4] = [-0.12427391]
gradients[“dba”].shape = (5, 1)

jonaslalin · October 15, 2021, 7:35am

You are missing something here. Hint: you are not using da_prevt

yipan9305 · October 15, 2021, 1:58pm

Thank you! I was confused and thought that da[:, :, t] was the “da_next” being passed in to cnn_cell_backward in the loop for each time step ‘t’. Now I guess (I am using the word ‘guess’ and I will ask a question at the end of this paragraph) that the output ‘da_prevt’ should be passed to the cnn_cell_backward in the loop. So right after initializing ‘da_prevt = np.zeros((n_a, m))’, I set it to ‘da_prevt = da[:, :, -1]’ outside of the loop. In the loop, I call the cnn_cell_backward(da_prevt, caches[t]) to compute the gradients. I still get the wrong outputs. I am not certain where I made a mistake. (now in the initialization part there are 7 lines). One more relevant question, if ‘da’ being passed into ‘cnn_backward’ is “Upstream gradients of all hidden states, of shape (n_a, m, T_x)”, shouldn’t ‘da[:, :, t]’ be passed in instead of ‘da_prevt’ in the loop for each rnn_cell_backward call?
…
# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m))
da_prevt = np.zeros((n_a, m))
da_prevt = da[:, :, -1]
# Loop through all the time steps
for t in reversed(range(T_x)):
# Compute gradients at time step t. Choose wisely the “da_next” and the “cache” to use in the backward propagation step. (≈1 line)
gradients = rnn_cell_backward(da_prevt, caches[t])
# Retrieve derivatives from gradients (≈ 1 line)
dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[“da_prev”], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]
# Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
dx[:, :, t] = dxt
dWax += dWaxt
dWaa += dWaat
dba += dbat

# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line) 
da0 = da_prevt
### END CODE HERE ###

outputs:
gradients[“dx”][1][2] = [0.04036334 0.01590669 0.00395097 0.01483317]
gradients[“dx”].shape = (3, 10, 4)
gradients[“da0”][2][3] = -0.0007053016291385033
gradients[“da0”].shape = (5, 10)
gradients[“dWax”][3][1] = 8.452426371294356
gradients[“dWax”].shape = (5, 3)
gradients[“dWaa”][1][2] = 1.2707651799408062
gradients[“dWaa”].shape = (5, 5)
gradients[“dba”][4] = [-0.50815277]
gradients[“dba”].shape = (5, 1)

Thank you so much!

jonaslalin · October 15, 2021, 2:01pm

now you are missing da[:, :, t]

yipan9305 · October 15, 2021, 2:08pm

Thanks jonaslalin! I got it! So the ‘da_next’ needs to be updated before it is passed into to the cnn_cell_backward call.

CamiloA · February 14, 2023, 8:02am

you can see da_next like this: da[:,:,t] + da_prevt
where da_prevt is initialized with zeros for the last rnn cell.
take account that da_prevt is updated in each reversal iteration of “t”

Yar-Nikolaev · December 5, 2023, 9:59am

Maybe will help someone in future - bit of scribbles to understand whats going on

kiran_kumar_Savvana · January 2, 2024, 8:05am

How did you create this. Very good for understanding and keeping things in perspective.

Topic		Replies	Views
C5W1A1 optional hw Sequence Models	11	1050	October 21, 2022
Question on backpropagation; week 1, prog. assignment 1 Sequence Models week-1	1	292	January 26, 2024
Need help with DL specialization Course 5 Lab1 rnn_backward function Sequence Models	4	418	August 6, 2023
C5W1A1Exercise3.1 Sequence Models	5	751	March 5, 2024
Building_a_Recurrent_Neural_Network_Step_by_Step Sequence Models week-1	4	328	January 4, 2024

C5, W1A1 optional RNN back propagation

Loop through all the time steps

UNGRADED FUNCTION: rnn_backward

Related topics