In the lecture, we’ve been taught that if we use a lot of features, let’s say 100, or a lot of polynomials of the same features in fitting a model it can cause overfitting.
To avoid overfitting a regularization term is added to the cost function.
According to the new cost formula, this regularization term should reduce the impact of some features or polynomial terms and thus avoid overfitting.
But my question is that the regularization term will decrease the value of all the weights equally then how is it reducing the effect of a few features but retaining the effect of other important features?
It is compact and helps to get a good overview on regularization.
As you mentioned the purpose of regularization is to reduce the model complexity by penalising it and so reduce overfitting. So it’s about reducing the model dependency of many parameters, e.g. by:
driving weights exactly to zero (L1 regularization) or
driving weights close to zero (L2 regularization)
In this source, you can also find a concrete answer to your question how the weights are influenced (in this example for L_2 regularization, dependent on \lambda.) I believe it’s useful because you also find exemplary histograms to visualize it.