I got all of the dimensions of the gradients correct, but their values seemed to be a bit off. It turned out that I replaced `tanh(np.square(Wax @ xt + Waa @ a_prev + ba) ) `

with `np.tanh(np.square(a_next))`

instead of just `np.square(a_next)`

. So the bottom line is to remeber how `a_next`

was calculated and avoid excessive tanh calculations.

