We use regularization to make weight of useless feature near to 0.
But the formula of regularization works to make every weight smaller( In particular, making large numbers much smaller).
So, I’m wondering whether useless feature have large value of weight or there is something that I didn’t find out.
I thought about it some more, and I think I was wrong on the point above.
Regularization can help with “useless” features. The model can try to overfit onto these useless features, and by adding regularization, we can lessen the impact of the model potentially paying too much attention to these features.
Useless features may or may not have a larger value than other features, it really depends on how the data engineer chooses to preprocess the data. Regardless of the value of those features, regularization can help reduce that feature and cause the model to less likely to overfit to them.
In addition to lessening the impact of useless features, regularization can make the model more generalized. Intuitively, if you think about the graph for a linear regression model with many parameters, the regularization term has an effect of reducing the amount of abrupt twists and turns on that graph.
TL;DR - Regularization’s purpose is to reducing overfitting, and one way it could achieve that is to reduce the effect of “useless” features when training (so the model doesn’t fit too much onto those features).