I got all of the dimensions of the gradients correct, but their values seemed to be a bit off. It turned out that I replaced tanh(np.square(Wax @ xt + Waa @ a_prev + ba) )
with np.tanh(np.square(a_next))
instead of just np.square(a_next)
. So the bottom line is to remeber how a_next
was calculated and avoid excessive tanh calculations.
2 Likes
HI @Tom_Pham
Thanks for sharing. It is great that you have resolved the problem.
1 Like