Gradient decent, costJ, regularization relationship

Here is my understanding for gradient decent, costJ, regularization relationship so far, if anyone can check my understanding is correct or not.


The gradient decent is including cost J function, dJ/dw = d/dw *J

dj/dw is the derivative, so it’s the relationship of how much J will change after w changes.

the a is the learning rate that how big the change will be. If the dj/dw = 4 it means w changes 0.0001, then J will change 0.0001 * 4

every time w is decreasing at
image

then the cost J will also decrease at the w’s changed value * derivative. And since Cost J is getting smaller, due to cost J gets smaller, then dj/dw is getting smaller, next round of W’s gradient decent will also decreased less than last time, like if last round w is decreased 1 then this round it may only decrease 0.9, as the reflect, cost J will also slow down the decrease rate.

That is why you will see the in cost J graph it’s like this
image

regularization:


As we add lambda function into cost function J, the cost J is increased, then in order to cancel the affects of lambda function, the w value needs to be as mall as possible.
image
As cost J is part of gradient decent, so we will add the lambda function in the gradient decent as well:

Since we are adding this lambda, the w gradient decent speed will increase as dj/dw is bigger. Also cost J decrease speed will increase as well.

I believe you are correct on all points.

1 Like