'Cost at test w,b: nan' trying to apply Breast Cancer Dataset

I try to build Logistic Regression model for the Breast Cancer dataset. I use Course 1 Week 3 Logistic Regression Assignment. At the step ‘Compute and display cost with non-zero w’ a get the ‘nan’ output:

while at the step ‘Compute and display cost with w initialized to zeroes’ the output was correct:

My cost_function code here:

{moderator edit: code removed}

Why the cost function fornon-zero w-values returns ‘nan’?

What is the purpose of your for-loop over the index ‘j’ if your loop doesn’t use ‘j’?

1 Like

Also, does your sigmoid() function work correctly?

1 Like

Also:

  • What’s the size of your ‘y_train’ array?
  • Does it contain only the values 0 and 1?
  • Is the number of ‘w’ values correct for the number of features in the x_train dataset?
1 Like

Thank you, Tom!

I’ve changed the code of the Cost function:

{moderator edit: code removed}

And now it returns the traceback:

It looks like I make mistake in my code but I can’t recognize it :face_with_monocle:

My sigmoid() function works correctly:

sigmoid_code

The size of my ‘y_train’ array and y_train values are:

test_shape_y_values

‘Is the number of ‘w’ values correct for the number of features in the x_train dataset?’:
the number of features in the x_train dataset is 30. So I put 30 ‘w’ values:

Please note that posting your code on the forum is not allowed by the course community standards.

If a mentor wants to see your code, we’ll ask to you send it to us via a private message - not by posting it on the forum.

I’ll edit your post to remove the code.

1 Like

Note that your response to my question about why you’re using a for-loop over ‘j’ was the opposite from what I had hoped for.

Since you’re using the dot function, and weren’t using ‘j’, you didn’t need the for-loop at all.

However, since you are using for-loops over ‘i’ and ‘j’ nested, then you don’t need dot(), you can just use normal scalar multiplication.

1 Like

Tom, thank you for answers. I’ve modified the code (I’ve decided to use for-loop and not use dot(). But I’ve come back to the first issue: after running the code for non-zero w-values my cost function has ‘nan’ value.

“nan” means Not a Number. It means your code is trying to do something mathematically impossible. The most common issue is trying to compute log(0), since that’s undefined (not a really number).

The problem may be with your choice of the initial weight values. Because of the way the logistic cost function works (computing the log of values that may be very close to zero), you cannot simply pick initial weights and expect they will work. A choice of weights that gives a value numerically close to zero (within your machine’s number representation limits) will cause log(0) to blow up.

What happens if you try all-zeros for the initial weight values? Using zeros for the initial logistic weights is a good idea, because the sigmoid of 0 = 0.5 regardless of the feature values, and that’s safely far away from 0.0 regardless of the size of the data set.

So please try all-zeros for the initial weights, and report back.

1 Like

Hello, Tom! Thank you for comprehensive answer. I’ve heard the opinion that computers are not so good in Maths like people. Now I see it :slight_smile:

My cost function = 14.930 after applying w-zero values for initial weights.

Should I always choose the w-values all zeros during training ML models?

When I use the code suggested in the final assignment (Week 3) with random setting of the initial w-values multiplied by 0.01 my cost function returns ‘nan’.

What value are you using for the initial ‘b’ value?

Does your code give good results after training if you use all-zeros as the initial weights (including ‘b’)?

For logistic regression, you really can’t just guess at what the best initial weight values might be. Because there are a lot of features in the data set you’re using, you can get in trouble if you just pick random or arbitrary weight values (especially if their mean value isn’t zero).

Using all-zeros (for both w and b) is a safe choice, because the starting f_wb value is always going to be sigmoid(0) = 0.5

Note that this advice only applies to logistic regression.

If you’re doing linear regression, you can pretty much start with any initial values (but all-zeros is still an easy choice).

If you’re using an NN, then using all-zeros is guaranteed to not work at all. You should never use all-zero initial weights for an NN.

I’ve rewritten the original assignment code ‘0.01 * (np.random.rand(2).reshape(-1,1) - 0.5)’ as ‘0.01 * (np.zeros(n).reshape(-1,1) - 0.5)’ and initial b-value is -8. In this case cost = ‘nan’.

When I’ve change the b-value to zero cost = 33.85 at Iteration 0 and cost = ‘nan’ since 1000 iterations.

Do you get correct predictions after training when you use all-zeros initialization?

My model predicts all '1’with the accuracy = 91.314.

Try reducing the learning rate. “Learning rate too large” is typically what causes ‘nan’ during training with the gradient descent method.

1 Like

Thank you, Tom! Your answers helped me to understand Logistic Regression better.

When I use the learning rate = 0.0001 the cost function doesn’t take ‘Nan’ values. Using the rate = 0.00001 leads to the better results of the cost function. While predicted values are still all ones :person_shrugging:

If you look at the code, I think the “all ones” result is from the predict() unit test - that’s not the result on the trained data.

1 Like