I figured out the issue but pointing out a small problem with the tests in this function. The activation calculation should use a_(t-1) while the y_pred calculation should use the activation at current timestep. However, using a_(t-1) for both a_t and y_t calculations also passes the test even though it is incorrect.
This will cause an issue later when trying to implement rnn_forward.
Thank you for noticing this! Interestingly another student made the same mistake that you were able to solve and so I was just investigating the test code this afternoon to understand why it was broken. I have found the issue and reported a bug to the course staff. I hope they will be able to fix it soon. The fix is not complicated.
Thank you for your report and congrats on solving the actual code bug that led you to discover the test problem.