Motivation for Logistic Regression

The first video for week 3 mentions that if we use linear regression for classification, everything to the left of the vertical blue line (passing through the point where the line y = 0.5 intersects the line, say l, given by the linear regression algorithm) is classified as 0 and everything to its right is classified as 1. I don’t quite get that.

Wouldn’t everything above the line y = 0.5 (on both sides of line l) be classified as 1 and everything below as 0? In other words, if we were to split the graph based on where examples are classified as positive, would the dividing line be x = 0.5 or y = 0.5?

Hello @mvrbiguv,

From the slide, we do not know whether that point is located at x = 0.5.

Since we know that point is located at y = 0.5, we can say that anything above y = 0.5 can be classified as 1. Moreover, since that line in the slide is strictly increasing, you may also say anything to the right of that point should be classified as 1.


Note also that it is a very simple linear problem with one feature. Therefore, if you know the location of that division point (x_0, y_0), it is your freedom to describe it whatever way you want - either to the right of the point or above the point.

When you are dealing with a multi-feature problems. You no longer can tell what is left or right, however, since you still have only one label (which is y), you can always describe it as "classifying it as 1 if y > 0.5 or calling it “above” the threshold of y=0.5.

The graph shows y-hat (y^), that is, it shows the algorithm’s prediction, so the Y-coordinate of each point is either 0 or 1 based on this prediction. Since everything on the left of the threshold (0.5) will be predicted as benign, there can’t be any point with Y=1 (y-hat = 1) there. If you wanted to show a misclassified sample, you’d draw a “cross”-labeled (x) point to the left, but still at Y=0 (a false negative).