Using L2 regularization what basically happens is that, we always end up with values of our “weights” less than values that we about obtain without regularization. Regularization is used to address overfitting , my question is that can we address overfitting by increasing the values of weights instead of decreasing it through regularization ??
No, increasing the weights has not been shown to be helpful.
OOH thanks , can you give me some explanation on why it might not be helpful ?
Hello @Anindhya_rao! I’m Su!
May I ask do you mean increasing the value of the weights meaning, the weight become larger weights?
If yes, I want you to recall why are small weights recommended & not large weights. Large weights make the network unstable. Minor differing options regarding input can lead to big difference in the output.
Hi @Anindhya_rao,
Interesting question.
You can try this and see what happens. To increase the weights, you could for example add the “regularization” term \frac{\lambda}{||v||^2} instead of \lambda ||v||^2. To minimize the cost function then you would want a larger value of ||v||. For large enough lambda, you’ll find that v goes to infinity (diverges) during gradient descent but for a sufficiently small value of lambda you might find that gradient descent converges to a value of v with larger components than it would have without the “regularization” term. I have not tried this myself, but doing so would inform you about your question. My expectation would be that this does not help with overfitting. In any case, this kind of exercise is always worth doing. Curiosity is what drives innovation.
If you do this, make sure to let me know the results!
Best,
Alex