W1 LSTM Network

Need pointers to debug the code:

{moderator edit - solution code removed}

—> 17 a_next_tmp, c_next_tmp, yt_tmp, cache_tmp = lstm_cell_forward(xt_tmp, a_prev_tmp, c_prev_tmp, parameters_tmp)
19 print(“a_next[4] = \n”, a_next_tmp[4])

in lstm_cell_forward(xt, a_prev, c_prev, parameters)
56 it = sigmoid(np.dot(Wi, concat) + bi)
57 cct = np.tanh(np.dot(Wc, concat) + bc)
—> 58 c_next = np.dot(ft, c_prev) + np.dot(it, cct)
59 ot = sigmoid(np.dot(Wo, concat) + bo)
60 a_next = np.dot(ot, np.tanh(c_next))

<array_function internals> in dot(*args, **kwargs)

ValueError: shapes (5,10) and (5,10) not aligned: 10 (dim 1) != 5 (dim 0)


Hi @Ari_M

The problem here is due to the use of np.dot(), the inner product multiplication, which is very different from the elementwise multiplication of two matrices.

The error message gave details on why it is wrong, because dim1 of the first matrix has 10 elements and dim 0 of the second matrix has only 5 elements. For inner product multiplication, dim1 of the first matrix and dim 0 of the second matrix have to be the same.

1 Like

Transposing the tahn of the c_next’s result fixes the issue, as it aligns the dimensions. HOWEVER, it does solve the problem as the cell does not pass one of the asserts

a_next = np.dot(ot, np.tanh(c_next))


a_next = np.dot(ot, np.tanh(c_next.).T)

New Error

AssertionError                            Traceback (most recent call last)
<ipython-input-18-7dee818208b9> in <module>
     28 # UNIT TEST
---> 29 lstm_cell_forward_test(lstm_cell_forward)

~/work/W1A1/public_tests.py in lstm_cell_forward_test(target)
    111     assert cache[1].shape == (n_a, m), f"Wrong shape for cache[1](c_next). {cache[1].shape} != {(n_a, m)}"
    112     assert cache[7].shape == (n_a, m), f"Wrong shape for cache[7](ot). {cache[7].shape} != {(n_a, m)}"
--> 113     assert cache[0].shape == (n_a, m), f"Wrong shape for cache[0](a_next). {cache[0].shape} != {(n_a, m)}"
    114     assert cache[8].shape == (n_x, m), f"Wrong shape for cache[8](xt). {cache[8].shape} != {(n_x, m)}"
    115     assert cache[2].shape == (n_a, m), f"Wrong shape for cache[2](a_prev). {cache[2].shape} != {(n_a, m)}"

AssertionError: Wrong shape for cache[0](a_next). (7, 7) != (7, 8)

Hi @Jonathan_Lugo

The result from inner product multiplication is different from that of elementwise multiplication. Why do you use np.dot()? why not just use elementwise multiplication?

1 Like

Thanks for pointing the problems with c_next and a_next statements!

Hi @Kic , thanks for pointing that out. I just found the following in the hint, that is why I used it, but now I am not sure:

At this point I am not so sure when to use which in the assignment :thinking:

(sorry @Ari_M , I don’t want to hijack your post …)

The key thing to realize is the notational convention that Prof Ng has consistently used throughout all 5 of these courses:

When he means “elementwise” multiplication, he always and only uses * as the operator.

When he means “dot product” style matrix multiplication, he just writes the two operands adjacent to one another with no explicit operator. It’s been this way consistently since the very beginning of Course 1.

With that in mind, look at the mathematical expressions given in the instructions for this section. It’s all right there for you to see.

There are quite a few dot products, which is why they give the hint about np.dot, although you’d think by Course 5 such hints would be considered almost insulting. But there are also a couple of instances of elementwise multiply in the formulas for c^{<t>} and a^{<t>}. So it’s the same story as always: you need to know what it is you are trying to say mathematically. Only when you clearly understand that, then you can write the python code to “make it so”. :nerd_face:

1 Like

Thanks! The c_next and a_next statements should use the operator * instead of np.dot.

1 Like

I appreciate your advice! I completed courses 1 and 2 quite a while ago, and then straight to course 5, that’s why the confusion. Thanks for your help on the basics!