Logistic Regression using the sigmoid function

When the sigmoid function is computed shouldn’t z be the following…

z = w.x + b - 0.5

This would mean that the sigmoid function is centered on the vertical axis passing through the point:

(0, 0.5)

And then the sigmoid function would have to be shifted right to map its values to the corresponding input feature values of tumor size?

1 Like

Hi @ai_is_cool

The sigmoid is \sigma(z) = \frac{1}{1 + e^{-z}} with z = w·x + b, so subtracting 0.5 is unnecessary since the function already outputs 0.5 when z equals 0, and adjusting b is sufficient for aligning with features like tumor size.

Also, why not use tanh that is centered at zero and its a symmetric alternative?

2 Likes

I don’t think you understand logistic regression and the use of the sigmoid function.

All I wanted to say is that shifting activation functions is unnecessary because they already represent probabilities correctly. Adjustments to the decision boundary should be managed through good tuning of weights and biases, not by changing the inherent properties of the activation function.

If you still think that you didn’t get your answer, just remove the solution label from my reply, so others can explain it differently!

You don’t need to subtract 0.5.

Note that if an additional 0.5 factor is needed (which it isn’t), it could be rolled into the learned value of ‘b’.

Sigmoid does not need to be offset from zero, because the output of sigmoid is a value between 0 and 1.

For an activation function that is symmetric around zero, use tanh().

Please explain mathematically your answer.

Why? What if b = 0.5 such that z = w.x , then is your statement about it passing through (0, 0.5) still valid? If not, what is the necessary condition(s) for your statement to be valid?

2 Likes

I have already explained “why”.

Please explain mathematically why you think otherwise.

Do you understand logistic regression using the sigmoid function?

What is an “activation function”? That term is not used by Andrew in the course. Please use only terms used in the course.

The expression

z = w.x + b

is a linear regression prediction model where w and b are determined by iterating until a best-fit straight line is found using the tumor size data set.

So the prediction model must attain a value of 0.5 as the output values of the training set are either 0 or 1.

The sigmoid function maps the weighted sum of inputs, z=w⋅x+b, to a value between 0 and 1, which is then interpreted as a probability. This is all that’s needed for classification tasks, as the sigmoid function naturally centers at 0.5 when z=0.

Adding or subtracting constants like 0.5 from z isn’t necessary because the function’s behavior can already be adjusted through tuning of weights and bias. Adjustments like these are enough to align the sigmoid’s output with the data and define decision boundaries.

As I said, the sigmoid function maps z to a range between 0 and 1, and this output is interpreted as a probability. For example:

  • When z=0, \sigma(z) = 0.5.
  • When z is positive and large, \sigma(z) approaches 1.
  • When z is negative and large (in magnitude), \sigma(z) approaches 0.

Here, z itself doesn’t need to attain a value of 0.5 because the sigmoid function handles this mapping naturally. The values of w and b are optimized during training so that the predictions match with the labels in the dataset.

Please explain mathematically what you mean.

What do you mean by “…weighted sum of inputs…”.

Please explain using mathematics.

But z = 0 is where the linear regression model meets the horizontal axis of the input feature values where the probability estimate is 0.

As you might know, logistic regression models the probability of a binary outcome using the sigmoid function. The raw prediction is computed as:

z = w \cdot x + b

where x is the input feature (for example: tumor size), w is the weight, and b is the bias. However, instead of using z directly, logistic regression applies the sigmoid function:

\sigma(z) = \frac{1}{1 + e^{-z}}

This function maps any real number to the range (0,1) that it better for probability estimation.

I’m not “…adding or subtracting constants like 0.5…” - I’m subtracting 0.5. I’m doing nothing else.

What do you mean by “…raw prediction…”, please use terms only used in the course.

It means the linear combination of input features in a model. Mathematically, this is expressed as:

z = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b

or:

z = \sum_{i=1}^{n} w_i x_i + b

This shows a linear function that transforms the input data into a single scalar value z , which is then passed through a non-linear function (in this case sigmoid) to produce a probability estimate.

Raw Prediction is the value z , which is computed before applying the sigmoid function. In the course, Andrew Ng describes this as the linear combination of inputs plus the bias term. It is not a probability but rather an intermediate step in computing the final classification output.