Here is my understanding for gradient decent, costJ, regularization relationship so far, if anyone can check my understanding is correct or not.
The gradient decent is including cost J function, dJ/dw = d/dw *J
dj/dw is the derivative, so it’s the relationship of how much J will change after w changes.
the a is the learning rate that how big the change will be. If the dj/dw = 4 it means w changes 0.0001, then J will change 0.0001 * 4
every time w is decreasing at
then the cost J will also decrease at the w’s changed value * derivative. And since Cost J is getting smaller, due to cost J gets smaller, then dj/dw is getting smaller, next round of W’s gradient decent will also decreased less than last time, like if last round w is decreased 1 then this round it may only decrease 0.9, as the reflect, cost J will also slow down the decrease rate.
That is why you will see the in cost J graph it’s like this
regularization:
As we add lambda function into cost function J, the cost J is increased, then in order to cancel the affects of lambda function, the w value needs to be as mall as possible.

As cost J is part of gradient decent, so we will add the lambda function in the gradient decent as well:
Since we are adding this lambda, the w gradient decent speed will increase as dj/dw is bigger. Also cost J decrease speed will increase as well.