Hey fellow learners,
There might be a huge possibility that this question might be stupid because I’m not good at math but why are we using a quadratic equation which is overfit and choose something simpler which would be a better fit , is it because we can’t avoid some features, but if we are keeping some features which are important and then regulating the weights to make it small doesn’t that basically make those features unimportant. Does this make sense or should I try learning it again.
The best advice is try to learn it again for sure.
You make a choice of linear or quadratic or else fit depending on the spread of you data points (and features chosen). Whichever fits best the problem
The regularization and diminishing of weights should happen as a result of training not manually by you, and this for the aim of getting a better fit to the spread of data.