Logistic Regression fundamental question

ai_is_cool · February 12, 2025, 10:09am

I am working my way through Course 1 Week of Regression and Classification as some of you may know but I have a fundamental question on logistic regression.

Why does Andrew use a value for z of w \cdot x + b in the first place when computing g(z)?

What is the motivation for using a linear regression prediction model as an argument to the function g(z)?

Alireza_Saei · February 12, 2025, 10:19am

Hi @ai_is_cool

The function g(z) (sigmoid) in logistic regression is applied to a linear combination of input features, z = w \cdot x + b , because this formulation provides a way to model probabilities and also keep interpretability.

The linear expression w \cdot x + b is a weighted sum of the input features (similar to linear regression) which captures the relationship between input variables and the predicted outcome. However, since linear regression outputs unbounded values, applying the sigmoid function helps us to keep the output constrained between 0 and 1 (suitable for binary classification).

Hope it helps! Feel free to ask if you need further assistance.

ai_is_cool · February 12, 2025, 1:20pm

Thanks Alireza.

Can you explain more mathematically what you mean by…

“ because this formulation provides a way to model probabilities and also keep interpretability.”

I can see that the expression

f_{w, b}(x^{(i)}) = wx^{(i)} + b

maps tumor size to any positive or negative number but is that number an estimate of the probability of the tumor being malignant using linear regression?

Thanks

luis_nn · February 12, 2025, 2:23pm

The point is, as I see it, that the model is trained with target values 0’s and 1’s, 1 meaning a certain outcome (say “the tumor is malignant”) and 0 meaning its opposite. The logistic function maps the values of the linear combination of inputs into a number between 0 and 1, so that the most close to 1 that number is the most likely the “yes” outcome. Hope this helps.

ai_is_cool · February 12, 2025, 2:58pm

Thanks Luis,

Do you know how initial values of w and b are chosen before the logistic regression model algorithm produces its best estimates of whether a tumor is malignant or benign?

luis_nn · February 12, 2025, 3:13pm

The values of w and b are set during training using gradient descent. The training process starts with some random, small values and gradient descent changes them step by step searching for the values that minimize the loss function for the training set. If you get a small loss for the training set, chances are that the final values for w and b will work fairly well when applying the model to new cases (provided the model architecture suits the kind of problem you are dealing with, the training set is big and representative enough, etc.)

ai_is_cool · February 12, 2025, 3:19pm

Typically what would you define as “…random small values…” for w and b?

luis_nn · February 12, 2025, 3:39pm

The course deals with all that kind of stuff. You will find details in the coming lessons, so be patient!

paulinpaloalto · February 12, 2025, 3:42pm

For Logistic Regression, it is not necessary to break symmetry by using random initialization. It can learn from initial values of all zeros or any other symmetric initialization. Here is a thread which discusses that and shows the math behind this point. What you will see on that thread is that symmetry breaking is required once we graduate to real neural networks, but LR is a special case.

Now it may be possible that using random values could allow for faster learning, but I did a few quick experiments a couple of years ago and it didn’t really seem to make any significant difference in the performance of the training (the cpu time and wall clock time to achieve a given level of convergence). But perhaps I didn’t use a sophisticated enough initialization algorithm. I have not taken MLS, so I don’t know what is discussed there, but initialization algorithms are covered in DLS Course 1 and Course 2 in some detail. So as Luis says, “stay tuned” for more information on that.

paulinpaloalto · February 12, 2025, 3:56pm

That is the definition of the Logistic Regression algorithm. He could have used a polynomial function of higher degree, but using a linear function allows every feature to have a tunable effect on the output. There is also a nice geometric interpretation of the meaning of LR: the decision boundary is a hyperplane in the input space and all the “yes” answers are on one side of the plane and all “no” answers are on the other side. Of course the decision boundary is:

g(z) = 0.5

and since g(z) is sigmoid, we know that means:

z = 0

or

\displaystyle \sum_{i = 1}^n w_i * x_i + b = 0

If you think about what the means geometrically, w is the normal vector to the plane and b is determined by the perpendicular distance from the origin to the decision boundary plane.

ai_is_cool · February 12, 2025, 3:59pm

What do you mean by “…break symmetry…”?

paulinpaloalto · February 12, 2025, 4:02pm

Did you read the thread I linked?

ai_is_cool · February 12, 2025, 4:05pm

Just to be consistent with the nomenclature in the course shouldn’t the last summation equation be written as:

\sum_{j = 1}^{n} w_j x_j^{(i)} + b = 0

paulinpaloalto · February 12, 2025, 5:10pm

As I mentioned, I’m not familiar with the material in MLS, but I am very familiar with the material in DLS. It also covers Logistic Regression in some detail. I used i instead of j as the index for the loop, but I could just as well have used fred or barney. In math, the symbols are just placeholders, right? They don’t really mean anything: the question is what operations we perform on them.

paulinpaloalto · February 12, 2025, 5:12pm

If you initialize all the weights to be zero, then they are symmetric, meaning they are all the same. If you initialize them to random values, then they are all different values. So they are no longer the same and no longer symmetric. Thus we say that symmetry is “broken”.

The thread I linked shows the math proving that LR can be trained with weights that are symmetric and, in particular, all zero.

TMosh · February 12, 2025, 7:56pm

We’re not using a linear regression prediction model. What you’re highlighting is an internal part of the logistic regression model.

Topic		Replies	Views
Why is the sigmoid function's z term equal to "w*x+b" in logistic regression? Supervised ML: Regression and Classification week-2	9	461	January 7, 2025
Logistic regression: why f(x)=g(z)? Supervised ML: Regression and Classification week-3	4	979	July 26, 2022
Relation between logistic and linear regression algorithms Supervised ML: Regression and Classification week-3	6	42	July 20, 2024
Regarding Logistic regression function's input Supervised ML: Regression and Classification week-3	6	983	August 30, 2023
Definition and interpretation of z in logistic regression Supervised ML: Regression and Classification week-3	6	740	March 28, 2023

Logistic Regression fundamental question

Related topics