# C5, W1A1 optional RNN back propagation

Need help on the function: rnn_backward(da, caches):
I have initialized the gradients as follows but got incorrect results.
# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m, T_x))
da_prevt = np.zeros((n_a, m, T_x))

# Loop through all the time steps

``````for t in reversed(range(T_x)):
# Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
gradients = rnn_cell_backward(da[:, :, t], caches[t])

# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line)
da0 = da_prevt
``````

I got:
gradients[“dx”][1][2] = [-0.15028183 -0.34554547 0.02071758 0.01483317]
as compared to the expected values:
gradients[“dx”][1][2] = [-2.07101689 -0.59255627 0.02466855 0.01483317]

Note: the shapes are all correct though.

Thanks!

1 Like

Thank you for your time, Jonaslalin. I am assuming it is allowed to post my codes other than the graded portion. Here is my completed coding of the rnn_backward function. Note: the rnn_cell_backward function was checked and the outputs were the same as the expected ones.

def rnn_backward(da, caches):
“”"
Implement the backward pass for a RNN over an entire sequence of input data.

``````Arguments:
da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)
caches -- tuple containing information from the forward pass (rnn_forward)

Returns:
dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)
da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)
dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)
dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)
dba -- Gradient w.r.t the bias, of shape (n_a, 1)
"""

### START CODE HERE ###

# Retrieve values from the first cache (t=1) of caches (≈2 lines)
(caches, x) = caches
(a1, a0, x1, parameters) = caches[0]

# Retrieve dimensions from da's and x1's shapes (≈2 lines)
n_a, m, T_x = da.shape
n_x, m = x1.shape

# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m))
da_prevt = np.zeros((n_a, m))

# Loop through all the time steps
for t in reversed(range(T_x)):
# Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
gradients = rnn_cell_backward(da[:, :, t], caches[t])
# Retrieve derivatives from gradients (≈ 1 line)
# Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
dx[:, :, t] = dxt
dWax += dWaxt
dWaa += dWaat
dba += dbat

# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line)
da0 = da_prevt
### END CODE HERE ###

# Store the gradients in a python dictionary
gradients = {"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba}

``````

outputs:
gradients[“dx”][1][2] = [-0.15028183 -0.34554547 0.02071758 0.01483317]

You are missing something here. Hint: you are not using `da_prevt`

1 Like

Thank you! I was confused and thought that da[:, :, t] was the “da_next” being passed in to cnn_cell_backward in the loop for each time step ‘t’. Now I guess (I am using the word ‘guess’ and I will ask a question at the end of this paragraph) that the output ‘da_prevt’ should be passed to the cnn_cell_backward in the loop. So right after initializing ‘da_prevt = np.zeros((n_a, m))’, I set it to ‘da_prevt = da[:, :, -1]’ outside of the loop. In the loop, I call the cnn_cell_backward(da_prevt, caches[t]) to compute the gradients. I still get the wrong outputs. I am not certain where I made a mistake. (now in the initialization part there are 7 lines). One more relevant question, if ‘da’ being passed into ‘cnn_backward’ is “Upstream gradients of all hidden states, of shape (n_a, m, T_x)”, shouldn’t ‘da[:, :, t]’ be passed in instead of ‘da_prevt’ in the loop for each rnn_cell_backward call?

# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m))
da_prevt = np.zeros((n_a, m))
da_prevt = da[:, :, -1]
# Loop through all the time steps
for t in reversed(range(T_x)):
# Compute gradients at time step t. Choose wisely the “da_next” and the “cache” to use in the backward propagation step. (≈1 line)
# Retrieve derivatives from gradients (≈ 1 line)
# Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
dx[:, :, t] = dxt
dWax += dWaxt
dWaa += dWaat
dba += dbat

``````# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line)
da0 = da_prevt
### END CODE HERE ###
``````

outputs:
gradients[“dx”][1][2] = [0.04036334 0.01590669 0.00395097 0.01483317]

Thank you so much!

now you are missing `da[:, :, t]`

Thanks jonaslalin! I got it! So the ‘da_next’ needs to be updated before it is passed into to the cnn_cell_backward call.

1 Like

you can see da_next like this: da[:,:,t] + da_prevt
where da_prevt is initialized with zeros for the last rnn cell.
take account that da_prevt is updated in each reversal iteration of “t”

Maybe will help someone in future - bit of scribbles to understand whats going on

1 Like

How did you create this. Very good for understanding and keeping things in perspective.