Programming Assignment: Regularization

Shirsendu_Dhar1 · May 3, 2022, 5:25pm

In this assignment, the lambda is set to 0.7, how can we find out the optimal solution. Also for lambda =0.1, the training score is 0.93, and the test score is 0.95, what does it tells about the model?

alvaroramajo · May 3, 2022, 6:33pm

Hi, @Shirsendu_Dhar1!

The lambda value for regularization is a way to penalize large weights values in the loss function calculation so the optimizer tries to reduce them. If changing lambda values from 0.1 to 0.7 does not really affect the metrics, it means the weights of the model were well distributed in the first place. That lambda refers to how much regularization you want for the model (greater values means more regularization)

Shirsendu_Dhar1 · May 4, 2022, 5:43am

Hi @alvaroramajo ,
I understood the importance of lambda. My question is, how to find the optimal lambda and dropout value?

alvaroramajo · May 4, 2022, 11:17am

Hi, @Shirsendu_Dhar1.

Normally those values are empirically calculated. You can roughly guess if you need more regularization effect depending on the metrics you get with different experiments and checking your weights distribution. There’s no such thing as a mathematical equation that gives you an optimal value.

paulinpaloalto · May 4, 2022, 3:15pm

One of the main topics of Week 1 and Week 2 of this course (DLS C2) is how to tune hyperparameters. Prof Ng spent quite a bit of time on this in the lectures. There are some more complicated cases in which you’re trying to tune more than one at the same time and he explains a “grid approach”. In the case of single parameter like \lambda for L2 regularization or the “keep probability” for dropout you can simply use a range of values. In both those cases a linear range of values would probably suffice. E.g. for the keep probability try sampling a range from 0.55 to 0.95 in increments if 0.05. If you have a huge training set, you might want to do the explorations with a smaller subset for efficiency. Then see if there is a pattern in the results. If you find a couple of values that seem to give the best results, you might then want to do another finer grained linear search between the best values. Prof Ng shows us some other cases like the \beta value for exponentially weighted averages, in which case a linear spacing is not good and you need to take into account the exponential nature of the value and use a logarithmic range.

But the high level point here is that Prof Ng covers this issue in detail in the lectures. If you missed that, perhaps it’s worth another look.

Shirsendu_Dhar1 · May 5, 2022, 5:53am

Thanx @paulinpaloalto, I am in Week 1 actually, but thanx for the insight.

Topic		Replies	Views
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun	4	561	December 21, 2021
Changing lambda does not change results too much Improving Deep Neural Networks: Hyperparameter tun	1	492	November 2, 2022
Relation between Lambda(Regularization Parameter) and Weight? Improving Deep Neural Networks: Hyperparameter tun	3	609	July 14, 2021
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun	4	481	April 28, 2023
Large value of lambda in Regularization Supervised ML: Regression and Classification week-3	14	974	December 6, 2022

Programming Assignment: Regularization

Related topics