Hi Fellow Learners / Mentors,
I am currently in course 3, working on my own complete implementation, based on what I have learned so far in the specialization and also trying to apply the lessons of course 3 wherever I can.
Very frequently I am encountering an issue in numpy while trying different values of hyperparameters like learning rate and regularization parameters. It says,
FloatingPointError: invalid value encountered in long_scalars
Based on what I found so far on this topic is, most likely for some computations floating-point overflow is happening causing some NaN
values. I was using np.float64 dtype for my implementations. I tried using np.float128 but that did not help much.
I should mention, that I am initializing my parameters exactly as we did in the assignments with random
and he
initialization. This issue does not happen always. For example, in my mini-batch gradient descent implementation, this happens when the number of epochs are in the range 2000 with a learning rate: 0.0007 and it goes away when the number of epochs are increased to the range of 5000 and I even get a decent outcome from the training.
One takeaway point here is, that we have to try multiple values for epochs and all of our hyperparameters. On the other hand, I want to understand, if others also have encountered such issue while training the model.
I have never seen this problem while working on any of the assignments. I am wondering if this is a common observation in practice, and what I can do to prevent this?
Thanks & Regards,
Chandan.
Hello @chandan1986.sarkar ,
While I don’t know exactly what you are trying to do, I made a fast google search and found several references to dividing by zero. Could this be the problem?
Hi @carloshvp, thanks a lot for the response and your time to look into this a bit. Yes, I also have tried to search a bit on the StackOverflow and found several ZeroDivision references. I am actually trying to create a project of binary classification from the scratch, identical to what we have learned from Course 1 and 2, essentially taking motivations from them. So far I have made some improvements.
- I identified, there was a mistake in my cost function and how I am averaging the costs for mini-batch gradient descent. Earlier I was dividing the cost by total examples m for each minibatch, which is most likely wrong.
- I corrected few other smaller mistakes as well and got a better situation now but the issue has not gone away completely. By better situation, I am saying, that there are lesser instance of this error and my cost estimation plots are looking identical to that of assignments now. But I still get this issue from time to time in some cases.
I initially thought, that because I am iterating the gradient descent process for a higher number of times (e.g., 5000 - 10000), some of my weights are becoming almost 0 but then I thought, we also do that in our assignments for quite a few cases but never face this issue. So it is very much possible, that there are still some mistakes in my implementation, which I overlooked.
So I was wondering if anyone faced similar problems before, and if yes, what was the outcome of the troubleshooting.
Thanks & Regards,
Chandan.