The way to debug this is to do the “dimensional analysis” which should give you a clear picture of where the problem is. Here’s a thread that describes that for what I think is the same test case you are talking about.
My guess is you are not correctly managing the A_prev
value for the output layer, which is after you fall out of the loop over the “hidden” layers.