How does regularization work for logistic regression

jaywai · March 21, 2024, 3:48am

It was mentioned in the course as well as demonstrated in the practice lab that by adjusting the value of the regularization parameter, we can control the amount of overfitting. How can we understand how does a large regularization parameter minimize the coefficients that corresponds to higher order polynomials?

Specifically how can we see this from the update step? Shouldn’t a large regularization parameter have the same effect as a large learning rate alpha? i.e. the coefficients bouncing between values instead of converging (imagine lambda is much bigger than the derivative term)?

gent.spah · March 21, 2024, 6:28am

Regularization helps suppress contribution of features (weights) that make the function fluctuate too much, adding too much noise. Learning rate alpha has to do with the amount of leap you take towards an optima of the model.

This video here I think does a pretty good job explaining regularization:

jaywai · March 21, 2024, 6:47am

This video describes the idea of using an additional term to minimize coefficients of high order polynomial. I understand the idea but don’t see how you can see it from the equation. I am asking how this is actually achieved at the iteration step.

I understand the the learning rate. Just take a look at the equation of the iteration. If the first term is dominating, what will you have? w=-w*(very large number). What happens at the next iteration? It will oscillate between very large and very positive numbers very much like what happens if the learning rate is large

TMosh · March 21, 2024, 7:56am

That’s just an easy example, which Andrew tends to use when explaining regularization.

In practice, all of the features have regularization applied with equal emphasis.

TMosh · March 21, 2024, 7:57am

The weights will only oscillate if the learning rate is too large.

Topic		Replies	Views
Why is the value of regularization parameter(lambda) the same for all the weight parameters Supervised ML: Regression and Classification week-module-3	3	532	July 28, 2022
Why do we want to make parameters smaller when we do regularization? Supervised ML: Regression and Classification week-module-3	2	547	May 1, 2023
Large value of lambda in Regularization Supervised ML: Regression and Classification week-module-3	14	1070	December 6, 2022
How to chose the right value for the regularization parameter? Supervised ML: Regression and Classification week-module-3	9	663	June 22, 2022
C1_W3: Regularized linear regression Supervised ML: Regression and Classification week-module-3	3	567	June 27, 2022

How does regularization work for logistic regression

Related topics