Logistic regression log0

Juanelo · December 12, 2024, 7:52am

Hi everyone:

I have a problem when implementing a basic logistic regression algorithm in Python. More precisely, I am trying to use a gradient descent algorithm in order to classify some data into two different classes. My training data set contains 9 examples with 2 features each, and I am trying to infer the values of three parameters w_1, w_2, and b, to plug them later into a linear function, and afterwards into a sigmoid function, so as to calculate the classification boundary.

For all this purpose, I am using the classical logarithmic cost function, and whenever I try to implement the gradient descent algorithm, at some iteration the log inside the cost function gets some argument that is so near to zero, that it returns a “math domain error”.

Wanybody knows how can I solve this problem? I’m attaching my Python notebook.
ClassAlgorithm.ipynb (103.1 KB)

TMosh · December 12, 2024, 8:07am

Two tips you might try.

Try a smaller learning rate.
Normalize the features. This makes selecting a learning rate much easier. Ideally the features would all be in a range of about -3 to + 3.

I am curious, is this notebook from a course you are attending?

Juanelo · December 12, 2024, 8:31am

Hi! Thank you very much. The learning rate change does not yield positive results. However, I’ll give a try to the normalization process.

I have been following one machine learning course indeed; namely, the 3-week Machine Learning course for begginers from Andrew Ng (Coursera platform). The code you see has been entirely written by myself, but following very closely the process described in the course, which I highly recommend.

Thanks again!

TMosh · December 12, 2024, 3:07pm

First, before you normalize the data, try initializing the weights to zero instead of one, and decrease the learning rate a lot (maybe 0.001 instead of 0.1).

What is the name of the course specifically? I’d like to know what material it covers vs. the method you’ve used in your code. For example, did that course discuss normalization?

TMosh · December 12, 2024, 6:08pm

Here’s what I think is happening, and why setting the initial weights to 0 will probably fix it.

When you use your sigmoid function (which in your file you call “sigma()” for some reason), if the argument is greater than about 35, the calculation in floating point precision will be identically 1.0.

This causes problems when you compute the log of (1-f_wb). That is what throws the error.

Looking at your data set, all of the raw features are fairly large and positive. This means when (w*X + b) is computed, it’s going to be a large positive value. This will cause the sigmoid to hit the upper limit of +1.0, and will trigger the numerical problem.

This is why typically the weights and biases are initialized to 0, or to small random values (if you are using a neural network).

So, without normalizing the features, I recommend you try initializing the weights to 0, and then use a very small learning rate, and increase the number of iterations.

If you normalize the features, then you can choose a larger learning rate so you can use fewer iterations.

Juanelo · December 12, 2024, 8:38pm

Hi Mosh.

Thanks again for your work. Here is the course I am following:
Supervised Machine Learning: Regression and Classification | Coursera

About the weights, I have tried initializing them to 0 instead of 1, but that does not yield positive results.

When trying the change on both learning rate and number of iterations, however, the results change, and I do not get the same problem anymore.

I may also try normalizing the features which, after what we have discussed so far, I deem the best general method to avoid this problem. However, for that I need to review the documentation.

Anyways your method worked perfectly well. Thank you very much"

TMosh · December 12, 2024, 8:50pm

Good that you’re finding success.

Note that the vertical axis scaling on your lower left plot is incorrect. And the axes are different than the plots earlier in the notebook.

That dataset isn’t linearly separable, so the best 2D fit isn’t every going to be very good.

TMosh · December 12, 2024, 8:51pm

I ran your notebook and got the same plots as you, by only changing the weights and bias to initial zeros, and setting the learning rate to 0.001. I also increased the iterations to 40,000, as the convergence wasn’t complete after 10,000.

TMosh · December 12, 2024, 8:53pm

And the course you’re attending is Course 1 in the Machine Learning Specialization.

Topic		Replies	Views
Image log classification problem AI Discussions	10	93	November 21, 2021
C1_W3_Logistic_Regression: Machine Learning Specialisation Supervised ML: Regression and Classification week-3	1	558	July 25, 2022
Logistic Regression, from scratch! Advanced Learning Algorithms week-3	10	871	October 18, 2022
Gradient Descent for Logistic Regression Supervised ML: Regression and Classification week-3	11	602	March 27, 2023
C1_W3_Logistic_Regression_ML Supervised ML: Regression and Classification week-3	4	540	August 13, 2022

Logistic regression log0

Related topics