C1W1_nb03: intuition of the line that separates positive and negative regions

In the 3rd lab titled " Visualizing tweets and Logistic Regression models", we plot a line to show the cutoff between the positive and negative regions. The gray line is the where the dot product of theta and X equal to 0, meaning they are perpendicular. What is the intuition behind this?

1 Like

You are correct that the dot product of two orthogonal vectors is zero.

But that’s not what’s happening here.

The dot product is an efficient way of computing the linear combination of the weights and features of an example. If that value is positive (or zero), you have a “True” result. If it’s negative, you have a “False” result.

1 Like

Thanks for your reply! Could you please elaborate on why when theta*x is positive, it’s a true result, and false otherwise? Is this specific to logistic regression?

1 Like

Yes, for logistic regression, that is by definition.

1 Like

Remember that we take sigmoid(\theta \cdot x) and then that is interpreted as the probability of a “yes” answer. Note that sigmoid(0) = 0.5 and sigmoid is monotonic. So a positive input gives you a probability > 0.5 and a negative input to sigmoid gives you a probability < 0.5.

As Tom says, that’s the definition of Logistic Regression. The other important point is that this doesn’t “just happen”: we train the function so that it learns the coefficients (the elements of \theta) which give the best possible match to the training data we are using. Of course there is no guarantee that your data is “linearly separable”, so Logistic Regression may not work well in all classification cases. In many cases, we will need a more complex decision boundary, so we will need more expressive functions. One approach to that is to graduate to Neural Networks, which will be covered in a later NLP course.

2 Likes

Thank you for the detailed explanation.

1 Like

Thank you!

1 Like