Logistic regression log0

Hi everyone:

I have a problem when implementing a basic logistic regression algorithm in Python. More precisely, I am trying to use a gradient descent algorithm in order to classify some data into two different classes. My training data set contains 9 examples with 2 features each, and I am trying to infer the values of three parameters w_1, w_2, and b, to plug them later into a linear function, and afterwards into a sigmoid function, so as to calculate the classification boundary.

For all this purpose, I am using the classical logarithmic cost function, and whenever I try to implement the gradient descent algorithm, at some iteration the log inside the cost function gets some argument that is so near to zero, that it returns a “math domain error”.

Wanybody knows how can I solve this problem? I’m attaching my Python notebook.
ClassAlgorithm.ipynb (103.1 KB)

Two tips you might try.

  • Try a smaller learning rate.
  • Normalize the features. This makes selecting a learning rate much easier. Ideally the features would all be in a range of about -3 to + 3.

I am curious, is this notebook from a course you are attending?

Hi! Thank you very much. The learning rate change does not yield positive results. However, I’ll give a try to the normalization process.

I have been following one machine learning course indeed; namely, the 3-week Machine Learning course for begginers from Andrew Ng (Coursera platform). The code you see has been entirely written by myself, but following very closely the process described in the course, which I highly recommend.

Thanks again!

First, before you normalize the data, try initializing the weights to zero instead of one, and decrease the learning rate a lot (maybe 0.001 instead of 0.1).

What is the name of the course specifically? I’d like to know what material it covers vs. the method you’ve used in your code. For example, did that course discuss normalization?

Here’s what I think is happening, and why setting the initial weights to 0 will probably fix it.

When you use your sigmoid function (which in your file you call “sigma()” for some reason), if the argument is greater than about 35, the calculation in floating point precision will be identically 1.0.

This causes problems when you compute the log of (1-f_wb). That is what throws the error.

Looking at your data set, all of the raw features are fairly large and positive. This means when (w*X + b) is computed, it’s going to be a large positive value. This will cause the sigmoid to hit the upper limit of +1.0, and will trigger the numerical problem.

This is why typically the weights and biases are initialized to 0, or to small random values (if you are using a neural network).

So, without normalizing the features, I recommend you try initializing the weights to 0, and then use a very small learning rate, and increase the number of iterations.

If you normalize the features, then you can choose a larger learning rate so you can use fewer iterations.

1 Like

Hi Mosh.

Thanks again for your work. Here is the course I am following:
Supervised Machine Learning: Regression and Classification | Coursera

About the weights, I have tried initializing them to 0 instead of 1, but that does not yield positive results.

When trying the change on both learning rate and number of iterations, however, the results change, and I do not get the same problem anymore.

I may also try normalizing the features which, after what we have discussed so far, I deem the best general method to avoid this problem. However, for that I need to review the documentation.

Anyways your method worked perfectly well. Thank you very much"

Good that you’re finding success.

Note that the vertical axis scaling on your lower left plot is incorrect. And the axes are different than the plots earlier in the notebook.

That dataset isn’t linearly separable, so the best 2D fit isn’t every going to be very good.

I ran your notebook and got the same plots as you, by only changing the weights and bias to initial zeros, and setting the learning rate to 0.001. I also increased the iterations to 40,000, as the convergence wasn’t complete after 10,000.

And the course you’re attending is Course 1 in the Machine Learning Specialization.