# Help Understanding Rnn Backprop Exersize Week 1 Assignment 1

In the coding exercise attatched below, I dont understand why we load the parameters Wya and by but we never calculate their gradients.

def rnn_cell_backward(da_next, cache):
“”"
Implements the backward pass for the RNN-cell (single time-step).

``````Arguments:
da_next -- Gradient of loss with respect to next hidden state
cache -- python dictionary containing useful values (output of rnn_cell_forward())

Returns:
dx -- Gradients of input data, of shape (n_x, m)
da_prev -- Gradients of previous hidden state, of shape (n_a, m)
dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
dba -- Gradients of bias vector, of shape (n_a, 1)
"""

# Retrieve values from cache
(a_next, a_prev, xt, parameters) = cache

# Retrieve values from parameters
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]

### START CODE HERE ###
# compute the gradient of dtanh term using a_next and da_next (≈1 line)
dtanh = None

# compute the gradient of the loss with respect to Wax (≈2 lines)
dxt = None
dWax = None

# compute the gradient with respect to Waa (≈2 lines)
da_prev = None
dWaa = None

# compute the gradient with respect to b (≈1 line)
dba = None

### END CODE HERE ###

# Store the gradients in a python dictionary
gradients = {"dxt": dxt, "da_prev": da_prev, "dWax": dWax, "dWaa": dWaa, "dba": dba}

``````

In the thread title, please identify the week number and assignment number.
For example “C? W? A?”.

You can add this to the thread title using the “pencil” icon.

I think that’s just a copy-and-paste error, when they copied too much code from rnn_cell_forward() and never noticed.

1 Like

Right, it looks like a mistake. You don’t need those values to compute anything. At least I didn’t when I followed the instructions. Note that they make a couple of comments in the instructions about how they are basically leaving out the y path in the computations as well:

Note: `rnn_cell_backward` does not include the calculation of loss from 𝑦⟨𝑡⟩�⟨�⟩. This is incorporated into the incoming `da_next`. This is a slight mismatch with `rnn_cell_forward`, which includes a dense layer and softmax.

And in the next section they say:

Note that this notebook does not implement the backward path from the Loss ‘J’ backwards to ‘a’.

** This would have included the dense layer and softmax which are a part of the forward path.*
** This is assumed to be calculated elsewhere and the result passed to `rnn_backward` in ‘da’.*

Sorry for the crummy formatting there.

1 Like