Question about Gradient Descent: Modifying Update Rules and Using Derivatives

YehonatanK · November 15, 2023, 9:23am

In the context of gradient descent, when updating the parameters in the optimization algorithm, we often subtract the derivative of the cost function from the current parameter values. The negative sign is typically incorporated into the learning rate, and this subtraction is consistent with moving towards minimizing the cost function. However, would it be valid to consider adding the gradient and then use the cost function itself (rather than its derivative) in the update rule, with a condition to multiply by -1 when needed? I’m trying to understand the implications of modifying the update rule in this way and whether it aligns with standard practices in optimization.

Additionally, what is the rationale behind using the derivative of the cost function in the update rule instead of the gradient itself? I’m curious about the reasoning behind using the derivative during the optimization process.

Mohammad_Omar_Adde · November 15, 2023, 9:45am

@YehonatanK

first of all the negative sign aligns the direction of movement with the steepest decrease in the cost function.

Secondly the use of the derivative, representing the slope or rate of change of the cost function, offers local information facilitating precise adjustments to parameters

Thirdly the derivative acts as a guide, indicating how much and in what direction the parameters should be adjusted to reach a lower cost.

may be mathematically u can test, but this is my note I collected from the M4ML course calculus.

Thanks to Luis Serrano (Instructor).

TMosh · November 15, 2023, 6:17pm

These are exactly the same thing. By definition, the gradients are the partial derivatives of the cost function.

YehonatanK · November 16, 2023, 7:29am

Thanks

Topic		Replies	Views
Gradient descent formula Supervised ML: Regression and Classification week-1	8	758	May 27, 2023
Gradient Descent Slope Neural Networks and Deep Learning coursera-platform	1	613	April 7, 2022
Course 1 Week 2,Logistic Regression Gradient Descent Neural Networks and Deep Learning coursera-platform	6	881	March 26, 2022
Why in the formula we are multiplying by - sign? Supervised ML: Regression and Classification week-3	4	495	January 11, 2023
About role of partial derivatives in gradient descent Neural Networks and Deep Learning coursera-platform	1	457	September 5, 2023

Question about Gradient Descent: Modifying Update Rules and Using Derivatives

Related topics