In case of dataset examples taken in lecture they have either one feature or two features. They are easy to visualize. Based on visualization it is easy to choose what type of polynomial curve will potentially fit the data best. Decision boundary lecture takes examples of wx+b and for two features it takes example of a circle w1 * x1^2 + w2 * x2^2 + b which becomes a circle.
Question is in case of training sets where there are many features, it is not possible to visualize. In that case how to choose the right polynomial before starting to compute the weights for fine tuning the model curve?
Two ideas.
Try all of the combinations of polynomial terms, increasing the order until you get good enough results. Only use the minimum cost as the metric.
Or use a neural network. The nonlinear activation in the hidden layer will automatically generste a comples model, without you having to engineer any additional features.