Why a straight line from logistic regression? | Week 3 practice lab

In the final assignment of Week 3, we are using two exam scores(X1 and X2) to predict if students will get selected or not.

The final decision boundary looks like this:

My question is, why didn’t we get a curved decision boundary from Logistic Regression. A curve would have been a better fit than a linear line.

Is there any way to improve the results so that the decision boundary curves?

The line boundary being straight (with respect to the input features) is a characteristic of the logisitic regression. This characteristic originiates from the fact that the model takes each feature as is:

y = \sigma(-(w_1x_1+w_2x_2+b))

Note that z = w_1x_1+w_2x_2+b is a straight line with respect to the features x_1 and x_2, and when z > 0, \sigma(-z) > 0.5 and the binary prediction is 1.

If you want a boundary being curved with respect to x_1 and x_2, you can do feature engineering and creates new, polynomial, input features such as x_1^2 and x_2^2. Now the following model will produce a circular boundary with respect to x_1 and x_2 as long as the weights are all non-zero:

y = \sigma(-(w_1x_1^2 + w_2x_2^2+b))

(Let me know if you have questions about the following statement:)
Note that, although it is circular with respect to x_1 and x_2, it is straight with respect to x_1^2 and x_2^2.

Knowing that engineering polynomial features can give you a cruved boundary, the remaining question is what polynomial features you will need in order to bend the boundary the way you want.

It would be a better fit, but we should note that it would be better for the training data. Fitting a perfect boundary to training data is not very difficult, but our objective is to train a model that predicts well on non-training data, because this is how we determine if a model has a good generalization power to future, currently unknown data.

In the model development stage, if we fit the model perfectly with the training data, but poorly on the non-training data (which is called a cv data), this is a well-known problem called overfitting. You will find videos about this in Course 2. We want to avoid overfitting problem so that the trained model performs excellent with the non-training data, but not primarily the training data.