Ref: https://www.coursera.org/learn/machine-learning/lecture/UZTPk/cost-function-with-regularization
In this lecture I understand the purpose of Lambda in regularization and its impact. Such as higher values of lambda would lead to smaller value of parameters w. Since we are adding the regularization term how it result in decreasing the parameters w? Can someone explain via mathematical proof? Any links or supporting material highly appreciated.
The core of the regularization term is the square of the norm of the w vector. The elements of w are the weights, so if you minimize that sum then you are minimizing the magnitudes of the individual w_j values. Then it’s just a question of how pronounced that effect is relative the MSE cost and that balance is controlled by the value of \lambda. You are minimizing the sum of the MSE term plus the regularization term. If back propagation from the MSE term points in the direction of increasing the magnitude of some of the w_j values, then where the total minimum of the cost ends up will be a competition between the pure suppression of all w_j expressed by the regularization term with the affects of the MSE cost.
2 Likes
Hi @icybergenome
When larger values of \lambda are chosen, the second term in the equation becomes larger. Meanwhile, the model’s aim is to reduce J, so it will try lowering the first term in the equation by penalizing higher values of weights
Also, I found this link which might be helpful:
https://stats.stackexchange.com/questions/388642/why-increasing-lambda-parameter-in-l2-regularization-makes-the-co-efficient-valu
Hope it helps! Feel free to ask if you need further assistance!
1 Like