How come penalising all wj terms help in reducing only non important wj parameters in regularization/gradient descent?

Amar_Jadhav · February 16, 2023, 1:42am

How come penalising all wj terms help in reducing only non important wj parameters in regularization ?

TMosh · February 16, 2023, 3:58am

All features are eligible for reduction via regularization. The algorithm has no knowledge of a feature’s importance. It’s just a mathematical process.

But features that aren’t important are going to have very small weight values, so regularization isn’t going to make them very much smaller.

Amar_Jadhav · February 16, 2023, 4:45am

@TMosh thank you for your reply!!

But I have follow up questions

If non important features going to have less weights, then how regularisation helps in anyway ?
If non important features going to have less weights, then model should not overfit in any case right ?

TMosh · February 16, 2023, 6:38am

You don’t know in advance which features are more or less important. So you have to apply regularization to all of them, and let the optimizer figure out the details.

The machine is doing the learning, so you don’t have to.

Topic		Replies	Views
Regularization: Intuition and Conservation of influence Supervised ML: Regression and Classification week-module-3	7	870	July 6, 2022
Doubt on Regularization Supervised ML: Regression and Classification week-module-3	8	150	June 3, 2024
Video: Cost function with regularization Advanced Learning Algorithms week-module-1	3	521	June 3, 2023
C1_W3 How regularization work? Supervised ML: Regression and Classification week-module-3	6	259	February 24, 2024
C1W3 - Doubts on Regularization Supervised ML: Regression and Classification week-module-3	1	431	June 9, 2023

How come penalising all wj terms help in reducing only non important wj parameters in regularization/gradient descent?

Related topics