First, understand why we need the regularization term. When our model overfits, a good job at training data but a poor job at test, it means our model does not generalize well. So, we need to penalize it. Penalization means we need to increase the cost (J). So, when we add that extra regularization factor, it means we are trying to increase the cost. However, at the same time, we also try to decrease the cost by optimization (gradient descent).

So, gradient descent will try to reduce the cost while regularization term will try to increase the cost. That is a clash, right? The more the lambda value, the more the cost will be. But gradient descent is trying hard to minimize the cost, so it will reduce the value of parameters (W).

Best,

Saif.