Hi,
Can anyone help me understanding the intuitive view of what exactly the regularization does to the model? I mean, the regularization work equivalently on each w_i and prevent all of them from being too large, so how can it choose some of the features to reduce their effects on the model?
Thank you for reading.
It doesn’t. All features have the same regularization applied.
This is one reason why normalizing the features is very important.
1 Like
I’m just taking the course and I don’t understand everything.
But I think, that regularization increase speed of gradient descent the more, the larger the parameter W. Thereby reducing the large values of the parameters
Please correct me if I’m wrong. Thank you
No. Regularization increases the cost value to refrain it from overfitting. While gradient descent decreases the cost to get a perfect fit. This combination gives us a “just good” fit; not overfit, not underfit. No matter what the values of parameters are, large or small; negative or positive.
2 Likes