The problem here is due to the use of np.dot(), the inner product multiplication, which is very different from the elementwise multiplication of two matrices.
The error message gave details on why it is wrong, because dim1 of the first matrix has 10 elements and dim 0 of the second matrix has only 5 elements. For inner product multiplication, dim1 of the first matrix and dim 0 of the second matrix have to be the same.
Transposing the tahn of the c_next’s result fixes the issue, as it aligns the dimensions. HOWEVER, it does solve the problem as the cell does not pass one of the asserts
a_next = np.dot(ot, np.tanh(c_next))
to
a_next = np.dot(ot, np.tanh(c_next.).T)
New Error
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-18-7dee818208b9> in <module>
27
28 # UNIT TEST
---> 29 lstm_cell_forward_test(lstm_cell_forward)
~/work/W1A1/public_tests.py in lstm_cell_forward_test(target)
111 assert cache[1].shape == (n_a, m), f"Wrong shape for cache[1](c_next). {cache[1].shape} != {(n_a, m)}"
112 assert cache[7].shape == (n_a, m), f"Wrong shape for cache[7](ot). {cache[7].shape} != {(n_a, m)}"
--> 113 assert cache[0].shape == (n_a, m), f"Wrong shape for cache[0](a_next). {cache[0].shape} != {(n_a, m)}"
114 assert cache[8].shape == (n_x, m), f"Wrong shape for cache[8](xt). {cache[8].shape} != {(n_x, m)}"
115 assert cache[2].shape == (n_a, m), f"Wrong shape for cache[2](a_prev). {cache[2].shape} != {(n_a, m)}"
AssertionError: Wrong shape for cache[0](a_next). (7, 7) != (7, 8)
The result from inner product multiplication is different from that of elementwise multiplication. Why do you use np.dot()? why not just use elementwise multiplication?
The key thing to realize is the notational convention that Prof Ng has consistently used throughout all 5 of these courses:
When he means “elementwise” multiplication, he always and only uses * as the operator.
When he means “dot product” style matrix multiplication, he just writes the two operands adjacent to one another with no explicit operator. It’s been this way consistently since the very beginning of Course 1.
With that in mind, look at the mathematical expressions given in the instructions for this section. It’s all right there for you to see.
There are quite a few dot products, which is why they give the hint about np.dot, although you’d think by Course 5 such hints would be considered almost insulting. But there are also a couple of instances of elementwise multiply in the formulas for c^{<t>} and a^{<t>}. So it’s the same story as always: you need to know what it is you are trying to say mathematically. Only when you clearly understand that, then you can write the python code to “make it so”.
I appreciate your advice! I completed courses 1 and 2 quite a while ago, and then straight to course 5, that’s why the confusion. Thanks for your help on the basics!