The equation z = w \cdot x + b does not represent a probability directly. Instead, it is an intermediate value that is transformed by the sigmoid function.
This function’s output is always between 0 and 1, representing possibilities. The decision boundary occurs at z = 0 , where:
\sigma(0) = \frac{1}{1 + e^{0}} = \frac{1}{2}
When z = 0 , the model predicts a probability of 0.5.
The placement of the decision boundary is determined by the choice of w and b , not by manually shifting the function by subtracting constants like 0.5.
The problem is z = 0 produces a probability of 0.5 but z = w.x + b = 0 does not produce a probability of 0.5 since w.x + b = 0 occurs when w.x + b intercepts the horizontal axis of tumor size where the data set has a probability of 0.
By “intermediate value,” I mean that the expression z = w ⋅ x + b that it is not directly a probability.
The term “transformed” means that this value z is passed through the sigmoid function, which maps z to a value between 0 and 1. → (the transformation is essential because the output of the linear model is not constrained to the [0, 1] range required for a probability)
It might be helpful to rewatch the course to gain a clearer understanding of why and where these values are used.
When z = w \cdot x + b = 0 , this does indeed result in the sigmoid output of \sigma(0) = 0.5 (which represents a 50% probability). Mathematically, z = 0 represents the point where the decision boundary lies. This decision boundary is where the model is equally likely to classify the data points as belonging to either class, and the probability of either class is 0.5.
I don’t think you understand the mathematics of logistic regression using the sigmoid function with an input computed from linear regression of the data set.
Yes, linear regression is used to compute z, but the overall model is logistic regression because we use the sigmoid to map the output to a probability.
@ai_is_cool, I think the confusion is coming from thinking of the linear regression as a separate step that we are trying to optimize independently. This is logistical regression, where we are trying to optimize using the function σ(z) where z is wx + b. As you point out, when z=0, it corresponds to a probability of 0.5, and this happens when z = w.x + b = 0. That’s what we’re trying to solve for.
See this screenshot from Prof. Ng’s logistic regression Gradient Descent Implementation video that highlights the difference in the function we’re using for logistic regression:
Isn’t linear regression used to determine w and b from the dataset of tumor size and diagnosis of benign - 0 or malignant - 1?
And then the linear regression model w \cdot x + b is used in logistic regression by setting its value of z equal to the linear regression prediction model?
I can see you have labelled a mathematical expression from the course as “looks like linear regression” but this is exactly linear regression.
That label in the screenshot that says “looks like linear regression” is actually from Prof. Ng’s video. He is showing how the general idea for solving with logistical regression looks like linear regression. In both cases, you are solving for w & b, but for logistic regression, the function you’re using is the logistic regression function shown in the screenshot.
Important to keep in mind that we’re solving for w & b in training, so for logistical regression, we need to use the full logistical function so that the sigmoid function is involved in helping determine w & b.