I have a question regarding the final output of dW and db. In the code, we take the sum of dW, db across all m training examples. I’m wondering if we need to divide them by number of training example m in the end? In the standard NN we implemented before, I remember we always divide dW and db by m.
Thanks,
Weijia