Hi Jaskeerat,
Your intuition, I think, is good. They are not so different: ordinary linear regression and logistic regression. Let’s tackle that first.
Consider the simple linear function: y1(i) = w1*x1(i) + b where y(i) is the i-th example of the target variable and x(i) is the corresponding (explanatory) feature value for the i-th example. WIth gradient descent, we are attempting to “learn” the weights of w and b that minimize the cost (the average loss) between the predicted value (y-hat) and the observed value y. Forget about the loss function for now, not important (but MSE cost function, nevertheless).
Go back to the house-price example of week 1. One aim was to use the learned values of w and b to predict the value of some other house not in the training set (say, one that you may want to buy or sell) based on a number of features (e.g. sq feet, # of bedrooms). Pick one feature for simplicity (as above): size measured in square feet.
Now visualize the scatter-plot in x1-y space. The regression line is the one (defined by values of w1 and b) that minimizes the average loss. That function, that line, is the regression line. Stick a pin in that! That is one of your lines that you are trying to keep straight in your head. The other is the decision boundary, which we haven’t talked about that yet.
Now let’s add another feature in addition to house size (x1). Say, number of bedrooms (x2). Now we have:
$y(i) = w1 * x1(i) + w2 * x2(i) + b
Gradient descent now learns w1, w2, and b by minimizing the average loss (as before). But now the regression “line” is a two-dimension object living in a three dimensional space, so now its visualized as a plane. Conceptually though, it is the “line” that we stuck the pin in above.
Suppose you want to know about the values for the features that make a house very expensive, say at least $1mil. So fix y = y* where y* = $1m. You are interested in the set of (x1, x2) that predict a house will be >= y*:
(x1, x2) such that: w1x1 + w2x2 >= y*
So now you ask, what are the values of x1 and x2 that separate the expensive houses from the not-so-expensive ones. (This is a Silicon Valley perspective, by the way ). Set the left-hand side of the the above expression equal to the right-hand side. Since y is fixed at y*, you now have an expression (a line!) in x1 and x2. Solve it for x2 and plot it in x1-x2 space. That line is your decision boundary. Points (x1, x2) that lie above that line predict an “expensive house”, those below, a not-so-expensive house. So there’s that other line that has been troubling you. Mathematically, it defines a level set.
Note well: We have not even talked about logistic regression yet.
Exercise: Describe the decision “boundary” in the simple regression (one feature) case.
Exercise: Suppose that you wish to use a regression equation that predict the proportion of voters who will vote for one of two candidates for President in a national election. Note that y must be contained in the interval [0, 1]. If the number of voters is so large as to be essentially infinite, the range of y is (0, 1). Majority ruleapplies: A candidate must at least achieve y = 0.5 to win, i.e. y* = 0.5. I propose the following linear regression model (so, no new concepts!)
log((1-y)/y) = w x + b
Try the following: (1) Explain why this is simply a linear regression model. Hint: A data series z = log((1-y)/y) could be computed. (2) Describe/interpret the argument to the natural log function (It helps if you go to the horse races and think about y as a probability). (3) Solve in terms of y. What do you have? (4) Think about how your manipulations in solving for y might have changed (transformed) the nature of the MSE cost function. (5) What might your dataset look like? identify two appropriate features and compute the decision boundary in (x1, x2) space.
Lastly, it’s OK to think about y as a probability (y=p).