Help Understanding Rnn Backprop Exersize Week 1 Assignment 1

In the coding exercise attatched below, I dont understand why we load the parameters Wya and by but we never calculate their gradients.

def rnn_cell_backward(da_next, cache):
“”"
Implements the backward pass for the RNN-cell (single time-step).

Arguments:
da_next -- Gradient of loss with respect to next hidden state
cache -- python dictionary containing useful values (output of rnn_cell_forward())

Returns:
gradients -- python dictionary containing:
                    dx -- Gradients of input data, of shape (n_x, m)
                    da_prev -- Gradients of previous hidden state, of shape (n_a, m)
                    dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                    dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                    dba -- Gradients of bias vector, of shape (n_a, 1)
"""

# Retrieve values from cache
(a_next, a_prev, xt, parameters) = cache

# Retrieve values from parameters
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]

### START CODE HERE ###
# compute the gradient of dtanh term using a_next and da_next (≈1 line)
dtanh = None

# compute the gradient of the loss with respect to Wax (≈2 lines)
dxt = None
dWax = None

# compute the gradient with respect to Waa (≈2 lines)
da_prev = None
dWaa = None

# compute the gradient with respect to b (≈1 line)
dba = None

### END CODE HERE ###

# Store the gradients in a python dictionary
gradients = {"dxt": dxt, "da_prev": da_prev, "dWax": dWax, "dWaa": dWaa, "dba": dba}

return gradients

In the thread title, please identify the week number and assignment number.
For example “C? W? A?”.

You can add this to the thread title using the “pencil” icon.

sorry about that, I added it

I think that’s just a copy-and-paste error, when they copied too much code from rnn_cell_forward() and never noticed.

1 Like

Right, it looks like a mistake. You don’t need those values to compute anything. At least I didn’t when I followed the instructions. Note that they make a couple of comments in the instructions about how they are basically leaving out the y path in the computations as well:

Note: rnn_cell_backward does not include the calculation of loss from 𝑦⟨𝑡⟩�⟨�⟩. This is incorporated into the incoming da_next. This is a slight mismatch with rnn_cell_forward, which includes a dense layer and softmax.

And in the next section they say:

Note that this notebook does not implement the backward path from the Loss ‘J’ backwards to ‘a’.

** This would have included the dense layer and softmax which are a part of the forward path.*
** This is assumed to be calculated elsewhere and the result passed to rnn_backward in ‘da’.*

Sorry for the crummy formatting there. :scream_cat:

1 Like