C5W1A1 - a problem with a lstm_forward function

I cannot get rid of this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-5b9326f2c416> in <module>
     14 parameters_tmp['by'] = np.random.randn(2, 1)
     15 
---> 16 a_tmp, y_tmp, c_tmp, caches_tmp = lstm_forward(x_tmp, a0_tmp, parameters_tmp)
     17 print("a[4][3][6] = ", a_tmp[4][3][6])
     18 print("a.shape = ", a_tmp.shape)

<ipython-input-56-f64054808a3c> in lstm_forward(x, a0, parameters)
     51         xt = x[:,:,t]
     52         # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)
---> 53         a_next, c_next, yt, cache = lstm_cell_forward(xt, a_next, c_next, parameters)
     54         # Save the value of the new "next" hidden state in a (≈1 line)
     55         a[:,:,t] = a_next

<ipython-input-20-598d0d575d68> in lstm_cell_forward(xt, a_prev, c_prev, parameters)
     56     it = sigmoid(np.dot(Wi, concat)+bi)
     57     cct = np.tanh(np.dot(Wc, concat)+bc)
---> 58     c_next = ft*c_prev + it*cct
     59     ot = sigmoid(np.dot(Wo,concat)+bo)
     60     a_next = ot*np.tanh(c_next)

ValueError: operands could not be broadcast together with shapes (5,10) (10,10) 

My initializations of inputs for the lstm_cell_forward are:

    a_next = a0
    c_next = np.zeros((n_a, m))

What may be a problem here?

With looking at your trackback, it said that either c_prev or cct (or both) has the shape of (10,10). Then, it can not be broadcasted.
I suppose it is better for you to track the shape of c_prev (or cct) with a print statement just like you did for others.
There is a possibility that “n_a” is not be correctly set in lstm_forward. Then, c_next is initialized with that value, which resulted in creating an incorrect shape of (10,10).
Anyway, the first thing to do is to track the shape of c_next in lstm_forward(), and c_prev in lstm_cell_forward().

After adding these print statements into lstm_cell_forward:

    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)
    print("ft shape", ft.shape)
    print("it shape", it.shape)
    print("cct shape", cct.shape)
    print("c_prev shape", c_prev.shape)
    print("c_next shape", c_next.shape)

I see:

    ft shape (5, 10)
    it shape (5, 10)
    cct shape (5, 10)
    c_prev shape (5, 10)
    c_next shape (5, 10)

So it looks that the problem is not in the lstm_cell_forward.

My shapes of n_y, n_a are:
n_y, n_a = a0.shape[0], a0.shape[1]

1 Like

Yes, the shape of n_a is wrong. If you look at a comment , it says

# Retrieve dimensions from shapes of x and parameters['Wy'] (≈2 lines)

a0 is an initial hidden state, and its shape is (n_a, m). If you get n_a from a0.shape[1], then, it is m, not n_a.

Please revisit this.

2 Likes