Programming Assignment: Regularization

In this assignment, the lambda is set to 0.7, how can we find out the optimal solution. Also for lambda =0.1, the training score is 0.93, and the test score is 0.95, what does it tells about the model?

Hi, @Shirsendu_Dhar1!

The lambda value for regularization is a way to penalize large weights values in the loss function calculation so the optimizer tries to reduce them. If changing lambda values from 0.1 to 0.7 does not really affect the metrics, it means the weights of the model were well distributed in the first place. That lambda refers to how much regularization you want for the model (greater values means more regularization)

Hi @alvaroramajo ,
I understood the importance of lambda. My question is, how to find the optimal lambda and dropout value?

Hi, @Shirsendu_Dhar1.

Normally those values are empirically calculated. You can roughly guess if you need more regularization effect depending on the metrics you get with different experiments and checking your weights distribution. There’s no such thing as a mathematical equation that gives you an optimal value.

One of the main topics of Week 1 and Week 2 of this course (DLS C2) is how to tune hyperparameters. Prof Ng spent quite a bit of time on this in the lectures. There are some more complicated cases in which you’re trying to tune more than one at the same time and he explains a “grid approach”. In the case of single parameter like \lambda for L2 regularization or the “keep probability” for dropout you can simply use a range of values. In both those cases a linear range of values would probably suffice. E.g. for the keep probability try sampling a range from 0.55 to 0.95 in increments if 0.05. If you have a huge training set, you might want to do the explorations with a smaller subset for efficiency. Then see if there is a pattern in the results. If you find a couple of values that seem to give the best results, you might then want to do another finer grained linear search between the best values. Prof Ng shows us some other cases like the \beta value for exponentially weighted averages, in which case a linear spacing is not good and you need to take into account the exponential nature of the value and use a logarithmic range.

But the high level point here is that Prof Ng covers this issue in detail in the lectures. If you missed that, perhaps it’s worth another look.

2 Likes

Thanx @paulinpaloalto, I am in Week 1 actually, but thanx for the insight.