week-3
Like in the gradient descent we are already doing this, isn’t the w we get is the smaller one? After each literation?
Why we are still need to add this into the cost function? Someone explain it in a simple way in math plz…
![image](https://global.discourse-cdn.com/dlai/original/3X/2/c/2cfa1bbf1eab11b9301d98d9920dc9c28eb43386.png)
week-3
Like in the gradient descent we are already doing this, isn’t the w we get is the smaller one? After each literation?
The regularized part of the cost function is to encourage the learned weights to be slightly smaller. This helps prevent overfitting.
The part circled in red in your first image is the portion of the gradients that account for the regularization term in the cost function.
But like the idea of gradient descent is to find the best fit “w” and “b”, so after we apply regularization in the gradient descent, in the end isn’t we already get the regularized “w”? Then why don’t we just use the regularized “w” in the cost function without lambda?
The formula you posted is not regularized. It’s extremely likely to cause overfitting of the training set.
I think, the gradient descent “regularization” is derived from the cost function regularization. So to apply regularization to your algorithm, (to avoid overfitting) you have to add the regularization term to both the cost function and the gradient descent function. (the gradient descent function has components of the derivative of the cost function anyways. so they are related!). I hope this helps!
It all starts with the cost function.
That should include a regularization term.
Then since the gradients are the partial derivatives of the cost function, this gives you the expression for the regularized gradients.