I have a doubt, I want to know when does mse becomes nan. What I know about this when the parameters are initialised to 0, but in my parameter it was in the range 0 to 1.
the epoch ran normally until 98 and showed mse 487 and then at 99 epoch mse turned nan.
I’m not familiar with that course, but in general:
“nan” means “not a number”. This can happen if the training doesn’t converge, and the cost explodes to infinity. Or it can happen if you get a invalid operation, like divide-by-zero or taking the log of zero.
Often the cause would be too large a learning rate.