[C5W1A1] wrong results of lstm_backward

I got the wrong values of results from the function lstm_backward. But the shapes of the results and all other corresponding functions are passed.

Below are my code and results. I’ve been stuck here for 2 days, please help me.

def lstm_backward(da, caches):
    (caches, x) = caches
    (a1, c1, a0, c0, f1, i1, cc1, o1, x1, parameters) = caches[0]

    n_a, m, T_x = da.shape
    n_x, m = x1.shape

    dx = np.zeros((n_x, m, T_x))
    da0 = np.zeros((n_a, m))
    da_prevt = np.zeros((n_a, m))
    dc_prevt = np.zeros((n_a, m))
    dWf = np.zeros((n_a, n_a + n_x))
    dWi = np.zeros((n_a, n_a + n_x))
    dWc = np.zeros((n_a, n_a + n_x))
    dWo = np.zeros((n_a, n_a + n_x))
    dbf = np.zeros((n_a, 1))
    dbi = np.zeros((n_a, 1))
    dbc = np.zeros((n_a, 1))
    dbo = np.zeros((n_a, 1))

    for t in reversed(range(T_x)):
        gradients = lstm_cell_backward(da[:,:,t] + da_prevt, dc_prevt, caches[t])
        da_prevt = gradients['da_prev']
        dc_prevt = gradients['dc_prev']
        dx[:,:,t] = gradients['dxt']
        dWf += gradients['dWf']
        dWi += gradients['dWi']
        dWc += gradients['dWc']
        dWo += gradients['dWo']
        dbf += gradients['dbf']
        dbi += gradients['dbi']
        dbc += gradients['dbc']
        dbo += gradients['dbo']

    da0 = da_prevt

    gradients = {"dx": dx, "da0": da0, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi,
                "dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo}

    return gradients

resuts:

gradients["dx"][1][2] = [ 0.01034214  1.03473735 -0.2398793  -0.43281115]
gradients["dx"].shape = (3, 10, 4)
gradients["da0"][2][3] = 0.5883931290038376
gradients["da0"].shape = (5, 10)
gradients["dWf"][3][1] = -0.02269017674887574
gradients["dWf"].shape = (5, 8)
gradients["dWi"][1][2] = 0.6099853844261891
gradients["dWi"].shape = (5, 8)
gradients["dWc"][3][1] = -0.013857139274558946
gradients["dWc"].shape = (5, 8)
gradients["dWo"][1][2] = 0.04772920545685257
gradients["dWo"].shape = (5, 8)
gradients["dbf"][4] = [-0.199665]
gradients["dbf"].shape = (5, 1)
gradients["dbi"][4] = [-0.7340795]
gradients["dbi"].shape = (5, 1)
gradients["dbc"][4] = [-0.56981661]
gradients["dbc"].shape = (5, 1)
gradients["dbo"][4] = [-0.24499124]
gradients["dbo"].shape = (5, 1)

Do you still need help with this issue?

Hi wziz,

What I see is that you have dx with 3 dimensions instead of dxt with two dimensions.

Best,

Rosa

Hey
Have the exact same result.
Can you help with the pinpointing the issue?

Also, it’s kind of assumed the the last time step derivative dc_next is zero (which is the first entry into the lstm_cell_backward for the parameter dc_next). why is that?

I have the same code but I get the correct results… I would say that the problem begins in the previous function lstm_cell_backward (which was very painful to code, actually)

1 Like

Hi Santiago,

Welcome to the community.

Yes, you also need to keep in mind the framework that you use for back propagation. You need to start with sigmoid followed by tanh later.

My problem was at a_next initialization should be a_next = a0
def lstm_forward(x, a0, parameters):

Initialize a_next and c_next (≈2 lines)

a_next = a0 # <------ CHANGE HERE
c_next = np.zeros((n_a,m))

# loop over all time-steps
for t in range(T_x):
    # Get the 2D slice 'xt' from the 3D input 'x' at time step 't'
    xt = x[:,:,t]
    # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)
    a_next, c_next, yt, cache = lstm_cell_forward(xt, a_next, c_next, parameters)

My code is the same as yours and I got the expected output. Are you sure you are running the cells in order, which is from the top to the bottom of the notebook?

I was running into a similar problem and what I found was that one of the calculations of my lstm_cell_backward() functions was incorrect. I carefully went back and compared the lstm_cell_backward() output with the expected output and found that one of the values was incorrect. After fixing that value then lstm_backward() started working correctly.

I had the same problem but it was in the previews code. you should review again the lstm_cell_backward I just made a mistake in one value and it was the problem
I had dc_prev = {moderator edit}
instead of: dc_prev = {moderator edit}

i share you all my code

{Moderator Edit: Solution Code Removed}

And I delete all your code. This is totally unacceptable to share your code to help other learners. This will lead to suspending your account. So, do not share your code as this is against the community Honor Code.

I’m apologize, I don’t do that again

My case is similar to @Witenberg . My lstm_forward was wrong although it passed the tests.

I accidentally initialize c_next to be zeros like part of c like

a_next = a0
c_next = c[:, :, T_x] # <== Wrong

But as mentioned in the notebook, setting one variable equal to the other is a “copy by reference”. So as the loop iterates, the c_next and c are totally messed up.

1 Like

This is it, I also set c_next as a reference to the c matrix/array, i.e. c_next = c[:,:,0] in the lstm_forward. After just create a newly initialized variable the answer is correct in the last section lstm_backward.

Good tip! The problem was that in my lstm_cell_backward() the result was equal to the given solution; but later, in the lstm_backward() I had a mismatch with dx.shape… Thanks!

1 Like