Question on backpropagation; week 1, prog. assignment 1

I have a code for backpropagation, and it yields right dimensions, but all outputs are zero. I just can’t figure out right away what the problem is… Am I initializing wrong? Select wrong things to put into single-cell back-rnn?
May be someone can comment. I paste this code, it is not graded.

def rnn_backward(da, caches):
Implement the backward pass for a RNN over an entire sequence of input data.

da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)
caches -- tuple containing information from the forward pass (rnn_forward)

gradients -- python dictionary containing:
                    dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)
                    da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)
                    dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)
                    dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)
                    dba -- Gradient w.r.t the bias, of shape (n_a, 1)

# Retrieve values from the first cache (t=1) of caches (≈2 lines)
(caches, x) = caches
(a1, a0, x1, parameters) = caches[0]

# Retrieve dimensions from da's and x1's shapes (≈2 lines)
n_a, m, T_x = da.shape
n_x, m = x1.shape 

# initialize the gradients with the right sizes (≈6 lines)
dx = np.zeros((n_x, m, T_x))
dWax = np.zeros((n_a, n_x))
dWaa = np.zeros((n_a, n_a))
dba = np.zeros((n_a, 1))
da0 = np.zeros((n_a, m))
da_prevt = np.zeros((n_a, m))

# Loop through all the time steps
for t in reversed(range(T_x)):
    # Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
    gradients = rnn_cell_backward(da_prevt, caches[t])
    # Retrieve derivatives from gradients (≈ 1 line)
    dxt, da_prevt, dWaxt, dWaat, dbat = gradients["dxt"], gradients["da_prev"], gradients["dWax"], gradients["dWaa"], gradients["dba"]
    # Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
    dx[:, :, t] = dxt  
    dWax += dWaxt  
    dWaa += dWaat  
    dba += dbat  
# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line) 
da0 = da_prevt

# Store the gradients in a python dictionary
gradients = {"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba}

return gradients
1 Like

Please check this recent thread where Paul answered your query.