In the assignment for course 1, week 3, there is a function called map_feature in the utils file. I think it is used in other labs as well. I see what it is doing, and how it is being used along with regularization to create a nonlinear decision boundary that can be used to fit data better than just a linear decision boundary. Here are a couple of questions about it that I’m hoping to get thoughts on:
1.) In the Assignment, the “Microchip test” data being looked at is plotted (“Figure 3: Plot of Training Data”) and appears to already be normalized when it is called in (based on the small range of values), or it is a data set that never needed to be normalized. The map_feature function is then used on that data before the weights are learned/regression is done. So, my question is, if you have data that needs to be normalized, then do you always normalize before you use the map_feature function?
2.) I see how the map_feature function creates a polynomial of the 2 original input features, but how do we know which polynomial to use? Right now I’m just naively thinking you just “use something with a lot of terms and let regularization work it’s magic”. Is there a reason for the specific polynomial that is being created in map_feature?
3.) What is it called that is being done by the map_feature function, is “feature mapping”, or something else. It seems to be pretty closely related to what is being done by “kernel methods” such as the radial basis function ‘RBF’ used inside of sklearn.svm.SVC for example. I’m not sure how related they are, but are they both have the same end goal: make a nonlinear decision boundary?
4.) Just thought of a 4th question: In the Assignment there are only input features, so the map_feature is used only on those 2 features to get a nonlinear decision boundary. What would happen if there were more than 2 features, (say 6 features total before using the map_feature function)? I don’t think it would make sense to make a nonlinear decision boundary and plot it in 2D space, right? But, could you use the map_features function on all 6 inputs, and then just have a logistic regression model that is performing a nonlinear decision, but you just can’t plot it? I’m mostly concerned with being able to have a multi-inputs and a nonlinear decision boundary, and less concerned with being able to plot it.
No. We make sure all features are normalized before sending them to the model, and it is prefered to normalize them in one-shot right before sending them to the model. So we don’t normalize some features first, then map features, then normalize the mapped features. This makes the process unnecessarily complicated.
Instead, we map features, then normalize, then send them to the model for training or for predictions.
One way is to pick a feature, plot it versus the label. If the plot looks non-linear, then a polynomial may be needed. The number of turns in that plot can give us an idea of the degree of polynomial needed, for example, if there is one turn and it looks never to turn again, then degree 2 can be chosen; if there are two turns and looks never to turn again, then degree 3, but this is not hard science.
The PolynomialFeature gives us not just x_1^3, x_3^3 and so on, but also the cross terms (e.g. x_1^2x_3). The problem of the above way is that we can’t observe the cross-term because we are just examining one feature at a time. Imagine we have 3 features, we obviously can’t plot them all in a graph because that will be 4 dimensional, but have to resort to multiple graphs and it can become an unmanagable amount of work when the number of features is high.
Then for the set of degree 2 polynomial feature, we will have x_i^2 (i from 0 to 5), and x_ix_j (i and j both from 0 to 5 and i > j).
We can’t plot all 6 features because it needs 6D. Although no one can stop us to just pick 2 of the features each time (and fix other dimensions to some fixed values) and make one 2D plot. However, how useful those plots are are to be examined.
This is all great, thanks! Just to clarify on some points:
A.) Is this correct?: Don’t do: normalize, map, then normalize the mapped features. Okay to do: normalize, map, then fit.
B.) I’m not sure I understand this sentence that starts with:
“Then for the set of degree 2 polynomial feature, we will have…”
If we had 3 input features for example, x, y, and z, and degree of 2, then would the mapped features look like this?:
x, y, z, x^2, y^2, Z^2, xy, xz, yz.
If so, then you would have to modify the map_feature function, correct?
Sorry I should have made it more clear in my last reply. I will edit it.
I actually didn’t check the map_feature function before replying to you. However, in general, the set of degree-2 polynomial features will give you all 9 resulting features that you have listed. This is also how the sklearn library behaves too. Maybe the map_feature is specialized to only produce some of those?