Hyperparameter lambda: how is this wrong?

I have seen 2 explanations :

explanation 1: By decreasing the hyperparameter lambda, we decrease the penalty of the l2 regularization term and therefore we will have high variance .

explanation 2: By decreasing the hyperparameter lambda, we reduce the weightage of the parameters of the features and therefore reduce complexity of model and therefore reduce high variance.
This sounds contradictory.

Hello @Arisha_Prasain,

I think the problem is here

The correct way is: if we increase lambda, we decrease the values of weights.

If we increase lambda, the contribution by the regularization terms to the cost function increases, and to effectively reduce the cost, the weights need a larger drop.

Cheers,
Raymond

2 Likes

I was confused by this question as well given the “must” qualifier and there are multiple ways to reduce variance. e.g. you can reduce high variance by getting more training data.

I imagine you could even decrease lambda if you got enough additional training data to substantially decrease the variance and still end up with net less variance than before.

Is this answer not False?

The answer should be “True” as the grader response explains. It’s what Raymond said above: if you increase \lambda, that means you are strengthening the effect of L2 regularization. Because doing that makes the value of the L2 penalty term larger, which will tend to make the W values smaller to compensate. That in turn should reduce overfitting and variance.

The “must” is maybe a bit misleading, but it’s just a question of interpretation. You are totally correct that there are other ways to address overfitting besides regularization (e.g. getting more training data), but that’s not what they are asking about in this question. I think this is consistent if you take the interpretation that the whole point of the question is that you are doing L2 regularization. Otherwise there is no \lambda involved, right? So if you are using L2 regularization, then you must increase \lambda to reduce overfitting.

2 Likes