Gradient decent, costJ, regularization relationship

Bio_J · March 13, 2024, 5:02am

Here is my understanding for gradient decent, costJ, regularization relationship so far, if anyone can check my understanding is correct or not.

The gradient decent is including cost J function, dJ/dw = d/dw *J

dj/dw is the derivative, so it’s the relationship of how much J will change after w changes.

the a is the learning rate that how big the change will be. If the dj/dw = 4 it means w changes 0.0001, then J will change 0.0001 * 4

every time w is decreasing at

then the cost J will also decrease at the w’s changed value * derivative. And since Cost J is getting smaller, due to cost J gets smaller, then dj/dw is getting smaller, next round of W’s gradient decent will also decreased less than last time, like if last round w is decreased 1 then this round it may only decrease 0.9, as the reflect, cost J will also slow down the decrease rate.

That is why you will see the in cost J graph it’s like this

regularization:

As we add lambda function into cost function J, the cost J is increased, then in order to cancel the affects of lambda function, the w value needs to be as mall as possible.

As cost J is part of gradient decent, so we will add the lambda function in the gradient decent as well:

Since we are adding this lambda, the w gradient decent speed will increase as dj/dw is bigger. Also cost J decrease speed will increase as well.

TMosh · March 13, 2024, 6:32am

I believe you are correct on all points.

Topic		Replies	Views
Unable to understand Gradient descent intuition Supervised ML: Regression and Classification week-1	4	43	February 8, 2025
Why does regularization reduce w? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	586	August 18, 2023
Will Lambda reduce the size of the w parameters? Supervised ML: Regression and Classification week-3	7	497	May 6, 2023
C2_W2_Computation graph (Optional) Advanced Learning Algorithms week-2	5	515	March 16, 2023
Understanding Cost function vs Gradient descent similarities Supervised ML: Regression and Classification	1	240	July 11, 2022

Gradient decent, costJ, regularization relationship

Related topics