This image is from “Optional lab: Sigmoid function and logistic regression” in which the model has just one feature:
Based on this discussion, I want to believe that z (the yellow line) is the decision boundary. But, that doesn’t make sense to me. In the lecture on decision boundaries, Prof. Ng says that you need to set z = some number (0, for example) to get the decision boundary. So in the lab example above:
z = 0.77x - 1.18 = 0
x ~= 1.53
So, the decision boundary is x = 1.53, which would be a vertical line on the graph at tumor size = 1.53. This is not the same as the orange line. So, how should I interpret the orange line?
If ‘z’ is the orange line, then the blue line is sigmoid(z). Typically the decision boundary is sigmoid(z) = 0.5. It’s not shown in the figure, but I’ll sketch it here:
The boundary is at 0.5 almost always - I cannot think of a case where I used anything else.
Thank you for the response. I agree that sigmoid(z) = 0.5 is the decision boundary in probability space (where tumor size is plotted against sigmoid(z)). But in feature space (where tumor size is plotted against malignant/benign), the decision boundary is the vertical line x=1.53. With that in mind I still don’t understand how to interpret the orange line.
I’m doing the same thing Professor did in this slide, but applying it to the lab example, which has only one feature. I’m concluding that the orange line (z) is not the same as the decision boundary (x=1.53) in feature-space, so this leaves me confused as to how to interpret the orange line.
The decision boundary of sigmoid(z) = 0.5 is equivalent to z = 0 because, well, sigmoid(z=0) = 0.5.
I think the key source of @James_Goff’s confusion is that, it is plotting y against x for the blue line and the data points, whereas the orange line is z against x but plotted on a y-x plane. Therefore, understanding a z against x curve in the space of y against x is confusing.
Maybe it’d be better to have a secondary y-axis for the z dimension.
To me, the purpose of the orange line is just to show how it looks like before applying the sigmoid - the orange line is before applying sigmoid, and the blue line is after. It is good to see how the sigmoid is bending the very large logit values to the horizontal level of one, as well as very small logit values to the horizontal level of zero.
I’m in the same boat as the OP. Does ‘z’ represent the straight line fit for the training data as done using linear regression ? I tried a flipping the “logistic” flag to False to get the linear regression … The values of w and b for that seems very different from what is obtained when logistic = True… Below are the snapshots. I’m comparing ‘y’ of linear regression and ‘z’ of the logistic regression. My understanding was both are same. But it doesn’t look like. The straight line fit obtained in logistic regression seems very different. Can you please point out where I’m going wrong?
The orange ‘z’ line in the plot is completely useless. It doesn’t tell you anything important.
I do not know why it’s even shown.
If you use linear regression (i.e don’t apply a sigmoid activation), you’ll get completely different values for w and b, because the activation has a significant role in the predicted values.
Thanks so much for the quick response ! I understand the orange line much better now.
It seems to just denote a line representing wx + b = 0 and the w, b are quite different when running linear and logistic regression.
I tried running both linear and logistic regression for the same training set … As you mentioned, the output values of ‘w’ and ‘b’ seem much different… Logistic regression seems to nicely fit the use case where we want to just classify . The cut off can always be set to 0.5 even if there is new training data.
@TMosh , thanks for correcting me here. I understand why you said the orange line in this graph doesn’t help much. As @rmwkwok mentioned in an earlier response, we seem to be plotting ‘z’ with y-axis representing sigmoid(z) which is quite confusing…
What I understood and was trying to convey was Wx+b = 0 represents the cut off / decision boundary in terms of the input feature.
In the 1-D case ( i.e only 1 feature ) , this reduces to a single point perhaps ? So when we solve here for 4.79x - 11.77 = 0 , we get x = 2.457… So with this model, any sample whose x >= 2.457 will be classified as “true” .
When we have two features, then the the cut-off becomes a line w0x0 + w1x1+b = 0. So, any input sample (x0, x1) that yields a value > 0 for the equation w0x0 +w1x1+b will be classified true , otherwise false.
That is true. But it has nothing to do with the orange line.
If z = 0, then sigmoid(z) = 0.5. That’s the traditional decision boundary for classification (> = 0.5).
z = w*x + b works for any size of w and x. They can both be vectors, in which case * represents a dot product.
But z = wx + b, there’s no sigmoid activation, or is there? Isn’t the orange line z different from the straight line fit by linear regression just because we’re not using sum of squared errors as the cost function anymore?
I am still confused as to how the two lines are different.
The orange line does not include the sigmoid activation.
The goal of the logistic cost function is not to model the data set with a straight line.
The goal is to create a boundary between the “true” and “false” regions.