I got all of the dimensions of the gradients correct, but their values seemed to be a bit off. It turned out that I replaced tanh(np.square(Wax @ xt + Waa @ a_prev + ba) ) with np.tanh(np.square(a_next)) instead of just np.square(a_next). So the bottom line is to remeber how a_next was calculated and avoid excessive tanh calculations.
HI @Tom_Pham
Thanks for sharing. It is great that you have resolved the problem.