'Cost at test w,b: nan' trying to apply Breast Cancer Dataset

VeronikaS · January 17, 2023, 6:58am

I try to build Logistic Regression model for the Breast Cancer dataset. I use Course 1 Week 3 Logistic Regression Assignment. At the step ‘Compute and display cost with non-zero w’ a get the ‘nan’ output:

while at the step ‘Compute and display cost with w initialized to zeroes’ the output was correct:

My cost_function code here:

{moderator edit: code removed}

Why the cost function fornon-zero w-values returns ‘nan’?

TMosh · January 17, 2023, 8:09am

What is the purpose of your for-loop over the index ‘j’ if your loop doesn’t use ‘j’?

TMosh · January 17, 2023, 8:11am

Also, does your sigmoid() function work correctly?

TMosh · January 17, 2023, 8:13am

Also:

What’s the size of your ‘y_train’ array?
Does it contain only the values 0 and 1?
Is the number of ‘w’ values correct for the number of features in the x_train dataset?

VeronikaS · January 17, 2023, 10:41am

Thank you, Tom!

I’ve changed the code of the Cost function:

{moderator edit: code removed}

And now it returns the traceback:

It looks like I make mistake in my code but I can’t recognize it

My sigmoid() function works correctly:

sigmoid_code

The size of my ‘y_train’ array and y_train values are:

test_shape_y_values

‘Is the number of ‘w’ values correct for the number of features in the x_train dataset?’:
the number of features in the x_train dataset is 30. So I put 30 ‘w’ values:

TMosh · January 17, 2023, 6:13pm

Please note that posting your code on the forum is not allowed by the course community standards.

If a mentor wants to see your code, we’ll ask to you send it to us via a private message - not by posting it on the forum.

I’ll edit your post to remove the code.

TMosh · January 17, 2023, 6:16pm

Note that your response to my question about why you’re using a for-loop over ‘j’ was the opposite from what I had hoped for.

Since you’re using the dot function, and weren’t using ‘j’, you didn’t need the for-loop at all.

However, since you are using for-loops over ‘i’ and ‘j’ nested, then you don’t need dot(), you can just use normal scalar multiplication.

VeronikaS · January 18, 2023, 11:28am

Tom, thank you for answers. I’ve modified the code (I’ve decided to use for-loop and not use dot(). But I’ve come back to the first issue: after running the code for non-zero w-values my cost function has ‘nan’ value.

TMosh · January 18, 2023, 6:25pm

“nan” means Not a Number. It means your code is trying to do something mathematically impossible. The most common issue is trying to compute log(0), since that’s undefined (not a really number).

The problem may be with your choice of the initial weight values. Because of the way the logistic cost function works (computing the log of values that may be very close to zero), you cannot simply pick initial weights and expect they will work. A choice of weights that gives a value numerically close to zero (within your machine’s number representation limits) will cause log(0) to blow up.

What happens if you try all-zeros for the initial weight values? Using zeros for the initial logistic weights is a good idea, because the sigmoid of 0 = 0.5 regardless of the feature values, and that’s safely far away from 0.0 regardless of the size of the data set.

So please try all-zeros for the initial weights, and report back.

VeronikaS · January 19, 2023, 8:23am

Hello, Tom! Thank you for comprehensive answer. I’ve heard the opinion that computers are not so good in Maths like people. Now I see it

My cost function = 14.930 after applying w-zero values for initial weights.

VeronikaS · January 20, 2023, 7:19am

Should I always choose the w-values all zeros during training ML models?

When I use the code suggested in the final assignment (Week 3) with random setting of the initial w-values multiplied by 0.01 my cost function returns ‘nan’.

TMosh · January 20, 2023, 7:35am

What value are you using for the initial ‘b’ value?

Does your code give good results after training if you use all-zeros as the initial weights (including ‘b’)?

For logistic regression, you really can’t just guess at what the best initial weight values might be. Because there are a lot of features in the data set you’re using, you can get in trouble if you just pick random or arbitrary weight values (especially if their mean value isn’t zero).

Using all-zeros (for both w and b) is a safe choice, because the starting f_wb value is always going to be sigmoid(0) = 0.5

TMosh · January 20, 2023, 7:37am

Note that this advice only applies to logistic regression.

If you’re doing linear regression, you can pretty much start with any initial values (but all-zeros is still an easy choice).

If you’re using an NN, then using all-zeros is guaranteed to not work at all. You should never use all-zero initial weights for an NN.

VeronikaS · January 20, 2023, 7:53am

I’ve rewritten the original assignment code ‘0.01 * (np.random.rand(2).reshape(-1,1) - 0.5)’ as ‘0.01 * (np.zeros(n).reshape(-1,1) - 0.5)’ and initial b-value is -8. In this case cost = ‘nan’.

VeronikaS · January 20, 2023, 7:55am

When I’ve change the b-value to zero cost = 33.85 at Iteration 0 and cost = ‘nan’ since 1000 iterations.

TMosh · January 20, 2023, 8:07am

Do you get correct predictions after training when you use all-zeros initialization?

VeronikaS · January 20, 2023, 10:15am

My model predicts all '1’with the accuracy = 91.314.

TMosh · January 21, 2023, 8:11pm

Try reducing the learning rate. “Learning rate too large” is typically what causes ‘nan’ during training with the gradient descent method.

VeronikaS · January 23, 2023, 9:20am

Thank you, Tom! Your answers helped me to understand Logistic Regression better.

When I use the learning rate = 0.0001 the cost function doesn’t take ‘Nan’ values. Using the rate = 0.00001 leads to the better results of the cost function. While predicted values are still all ones

TMosh · January 23, 2023, 4:09pm

If you look at the code, I think the “all ones” result is from the predict() unit test - that’s not the result on the trained data.

Topic		Replies	Views
"nan" value for cost at test w,b Supervised ML: Regression and Classification week-module-3	5	508	July 31, 2022
C1_W3_Logistic_Regression_Cost for initial value not matching Supervised ML: Regression and Classification week-module-3	2	513	September 1, 2022
C1_W3_Logistic_Regression assignment compute cost error Supervised ML: Regression and Classification week-module-2	3	340	October 25, 2023
Nan in week 3 assignment Neural Networks and Deep Learning coursera-platform	10	682	May 2, 2021
Cost function problem Neural Networks and Deep Learning coursera-platform	19	866	August 16, 2023

'Cost at test w,b: nan' trying to apply Breast Cancer Dataset

Related topics