About role of partial derivatives in gradient descent

allegro6335 · September 5, 2023, 9:22am

In gradient descent algorithm, we update parameters like this:
w = w-αdJ/dw
b = b-αdJ/db

What is the exact role of those partial derivatives?
Does the actual value(or magnitude) of these derivatives matter?
Or is it just representing which direction to take a step?

Jamal022 · September 5, 2023, 11:13am

Hello @allegro6335,

Well the partial derivatives play a crucial role in determining how to update the parameters (w and b) of a machine learning model. it’s simply provide information about the slope or gradient of the cost function (J) with respect to each parameter (w and b) at a specific point in the parameter space. So If the derivative is positive, you move in one direction; if it’s negative, you move in the opposite direction.

Now by coming to your second part of your question “Does the actual value(or magnitude) of these derivatives matter?”

The answer is “Yes it matters” because it determines how big or small your steps should be when adjusting parameters. Larger derivatives suggest larger steps, and smaller derivatives suggest smaller steps.

So, yes, the actual value (magnitude) of these derivatives matters because it helps control the size of the steps you take during parameter updates, which affects how quickly your model learns and converges to a good solution.

I hope it makes sense now,
Regards,
Jamal

Topic		Replies	Views
Question about Gradient Descent: Modifying Update Rules and Using Derivatives Neural Networks and Deep Learning	3	365	November 16, 2023
C2_W2_Computation graph (Optional) Advanced Learning Algorithms week-2	5	515	March 16, 2023
[Help] Derivatives Neural Networks and Deep Learning	3	455	June 14, 2023
Week 3 update_parameters, how to compute partial derivative J Neural Networks and Deep Learning	1	710	July 5, 2021
Why does derivatives for w_j in gradient descent differ from b? Supervised ML: Regression and Classification week-2	2	276	November 29, 2023

About role of partial derivatives in gradient descent

Related topics