Nan in week 3 assignment

I passed all tests successfully and even checked the correspondence of all numbers.
But for some reason I get nan numbers when I test the model.
I was suspecting a too high learning_rate (I see it nowhere specified by the way) but that happens even if I pass a tiny number as learning_rate.

Any idea?

Cost after iteration 0: 0.692739
Cost after iteration 1000: nan
Cost after iteration 2000: nan
Cost after iteration 3000: nan
Cost after iteration 4000: nan
Cost after iteration 5000: nan
Cost after iteration 6000: nan
Cost after iteration 7000: nan
Cost after iteration 8000: nan
Cost after iteration 9000: nan
W1 = [[nan nan]
 [nan nan]
 [nan nan]
 [nan nan]]
b1 = [[nan]
 [nan]
 [nan]
 [nan]]
W2 = [[nan nan nan nan]]
b2 = [[nan]]
 All tests passed.

Try and debug your output for smaller number of iterations (1), to catch when NaNs start. If you are lucky, you’ll get this right from the get go, after the first iteration. This would be easy. I would guess a division by zero. You only need one to pollute everything with NaNs (could be the error at the last layer is corrupted, and then back propagated to ruin everything)

1 Like

Hi @l.huracan,

Congrats on solving most of the problem! The answer from @yanivh gets you in the right direction, here’s a little bit more along those lines.

I would also print out the cost on every iteration.

        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
#        # Print the cost every 1000 iterations
#         if print_cost and i % 1000 == 0:
#             print ("Cost after iteration %i: %f" %(i, cost))

Run the example with a small number of iterations (say 5).

You can then print the inputs to the compute_cost function by adding before the call to compute_cost:

print ("Iteration %i" %i)
print("A2")
print(A2)
print("cache")
print(cache)

Now look at the inputs to your compute_cost function right before it returns nan. Is there anything wrong with them? If you pass in a nan, you well get nan back.

Try calling compute_cost with these values in a separate cell, do you get nan? Walk through the cost function calculation and see if you have a division by zero or some other problem.

Let us know how it goes. Good luck!

Petri

x

I have similar problem for the Cost function

cost = (1/-m) * np.sum (( Y*np.log(AL) + (1-Y)*np.log(1-AL)), axis=1, keepdims=True )

There is no divide by zero. But log of 0 is undefined.
At what -ve power of e AL value will be deemed zero by log function?

Here is AL that generates NaN

[[9.99999999e-01 1.00000000e+00 1.00000000e+00 1.00000000e+00
1.00000000e+00 9.99998697e-01 9.99689055e-01 9.98987296e-01
1.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+00
1.00000000e+00 1.00000000e+00 9.99997766e-01 1.00000000e+00
1.00000000e+00 1.00000000e+00 9.99975557e-01 1.00000000e+00
1.76802493e-09 3.45822717e-03 4.95418096e-05 3.09661718e-12
3.30659081e-21 1.62800772e-14 3.88169021e-18 2.83461154e-04
3.99641238e-12 9.09844786e-04 2.56082775e-07 1.57937604e-13
3.50683199e-17 2.64660330e-19 6.07843301e-12 6.97355840e-07
9.97489506e-01 1.92262499e-01 1.92262499e-01 3.88829639e-07
1.92262499e-01 1.92262499e-01 1.92262499e-01 1.17367222e-03
1.92262499e-01 9.30968322e-04 1.90474557e-11 1.14305555e-16
7.07184793e-19 2.79591223e-24 8.11220465e-13 6.66724188e-07
1.92212670e-09 1.63457889e-07 2.61283985e-18 3.13871697e-21
1.01108492e-19 1.68017100e-16 2.73019371e-12 2.13968595e-09
1.14859750e-08 1.17481339e-08 1.01151413e-03 8.90329842e-05
2.36488606e-05 5.10787093e-05 1.67056827e-12 5.36222634e-20
3.87110820e-21 4.68735063e-11 9.51776437e-16 2.85277910e-22
1.99680055e-19 1.28592778e-14 2.51287116e-17 5.07699327e-16
3.06743585e-13 1.92262499e-01 1.92262499e-01 5.28983880e-06
1.92262499e-01 1.00000000e+00 1.00000000e+00 9.99999873e-01
1.92262499e-01 1.00000000e+00 1.00000000e+00 1.00000000e+00
1.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+00
9.99845878e-01 1.00000000e+00 9.98268243e-01 9.99626018e-01
1.00000000e+00 9.99650146e-01 9.99999994e-01 9.99999478e-01
9.99998160e-01 9.99999999e-01 1.00000000e+00 1.00000000e+00
1.00000000e+00 1.00000000e+00 1.00000000e+00 9.99999979e-01
9.99980645e-01 1.00000000e+00 9.87470723e-01 9.99999908e-01
9.99999993e-01 1.00000000e+00 1.00000000e+00 1.00000000e+00
9.99796011e-01 1.92262499e-01 3.20185928e-04 1.92262499e-01
1.92262499e-01 9.99997600e-01 1.00000000e+00 1.00000000e+00
1.00000000e+00 1.00000000e+00 9.99999996e-01 1.00000000e+00
9.99999998e-01 1.00000000e+00 1.92262499e-01 1.00000000e+00
1.00000000e+00 1.00000000e+00 9.99906054e-01 1.00000000e+00
1.00000000e+00 1.00000000e+00 9.99999887e-01 9.99999995e-01
9.99251191e-01 1.92262499e-01 9.99999960e-01 1.00000000e+00
1.00000000e+00 9.99999880e-01 1.00000000e+00 1.00000000e+00
9.99999999e-01 9.99998947e-01 9.99159343e-01 1.92262499e-01
6.16672750e-11 1.49495403e-02 2.47824978e-08 1.53590881e-04
1.92262499e-01 5.93266437e-16 1.94438489e-04 1.92262499e-01
2.23802892e-09 8.39215765e-11 1.92262499e-01 5.34975231e-04
2.13053102e-04 1.34732518e-04 1.92262499e-01 3.84013600e-05
1.92262499e-01 1.87175856e-11 2.13609133e-08 1.67417856e-11
1.92262499e-01 6.48063759e-18 1.44794138e-12 3.83071664e-04
2.58898346e-03 3.55035615e-04 2.00564482e-05 1.92262499e-01
9.30148091e-04 6.58495710e-15 1.24295185e-05 2.04945662e-07
5.82963290e-08 2.18362467e-06 1.92262499e-01 8.56487656e-07
4.41973725e-07 1.82210084e-04 7.29905276e-12 1.00000000e+00
1.00000000e+00 9.99630138e-01 1.00000000e+00 1.00000000e+00
1.00000000e+00 1.00000000e+00 1.00000000e+00 9.99999989e-01
1.00000000e+00 9.99996425e-01 1.00000000e+00 9.99706698e-01
1.34293004e-11 9.99477172e-01 9.99334939e-01 9.97441531e-01
1.00000000e+00 1.92262499e-01 1.92262499e-01]]
Cost after iteration 20000: nan

Thank you for your quick responses!

In my case, I oversaw a missing division by m in the backpropagation (it still said “all tests passed”).
Now it works!

Hi,
I am facing a similar problem. The cost doesn’t change. Why could that be happening? Can anyone explain this to me?
image

Thanks.

Hi mate, your model is not updating any parameter so it prints same cost for every iteration case. Hence, check out the updating parameter part. If this part is correct check whether you are implemented it or not for our nn_model case. Normally, what we are expecting is that first we get a big cost value from forward prop then, we would onserve the decreasing cost values after each iteration because back prop updates parameter such that cost function can reach a specific value(converge).

1 Like

Thanks for the reply man.
I am not able to locate any error whatsoever.
I have backpropagated myself to the beginning of my implemented code to update it. I guess my brain needs an upgrade itself. I guess I will write the whole code again maybe.

Hi @kshitijsharma,

I would investigate the loop in the nn_model further.

If the cost function is not updating, it means the parameters are not being updated.

Start by checking the cost function calculation: does changing parameter values change the cost?

To examine the loop nn_model a little closer:
I would limit the number of iterations to 5 (or something small) and print out the gradients. If the grads terms are zero, the parameter values won’t update. If that’s the case, examine your backpropagation: why is it returning zeros for the grads?

If the grads are non-zero but your parameters are not updating, then your gradient descent step is not working as expected: non-zero grads should lead to parameters updating which means the cost should update.

Let me know what you find.

Good luck!

Best,
Petri

1 Like

Hey man @petrifast
I investigated the loop in the nn_model further and the problem was…
I wrote the spelling of parameters as paramters and that is why the parameters weren’t getting updated. Very dumb of me.
The probable cause is sleep deprivation.
Thank you for replying me back.
Kshitij Sharma.

1 Like

Congrats @kshitijsharma!

All bugs are obvious… once you see them!

It’s not a silly mistake.

You just debugged your machine learning model, that’s pretty cool and a key skill in being a data scientist.

Keep up the good work.

Best,
Petri

2 Likes