Unable to understand Gradient descent intuition

Tom19 · February 7, 2025, 7:41am

In this graph, we want to find the value of w, for which J(w) is the minimum.
Also, d(J(w))/d(w) - can also be stated as the rate of change in function J(w) w.r.t w (if I am not wrong)
and, the equation to update w, is
w = w - alpha * d(J(w))/d(w)

What I cannot understand is why are we using d(J(w))/d(w) to update w.

d(J(w))/d(w) tells us how function J(w) changes as w changes. So how can we use that term to update ‘w’.

TMosh · February 7, 2025, 8:29am

dJ/dw is the slope of the curve you drew.

If the slope is positive, we need to reduce w so we get closer to the minimum.
If the slope is negative, we need to increase w so we get closer to the minimum

rmwkwok · February 7, 2025, 9:17am

I think @TMosh has made the point. Rather than the magnitude part which is modified by the learning rate, I think we are solely relying on the slope to tell us the sign for whether to increase or decrease the weight to get it closer to a cost minimum.

If we are worrying about w and dJ/dw being in different units, let’s not forget we still have the learning rate (“unit of w per unit of slope”) to get the unit back right.

Cheers,
Raymond

Tom19 · February 8, 2025, 5:36am

Thank you @TMosh and @rmwkwok .

@TMosh Thanks for helping me understand the equation from a different context. Instead of thinking about how cost function changes wrt w, if I think of in which direction to move to get to a local minimum wrt w, the parameter update equation makes complete sense.

@rmwkwok You actually figured what my real contention was with- which was the unit. I didnt think about the learning rate (assuming it was just a constant). Thank you.

rmwkwok · February 8, 2025, 10:58am

You are welcome, @Tom19!

Topic		Replies	Views
MLS_C1_W1_Gradient Descent intuition Supervised ML: Regression and Classification week-1	3	494	January 17, 2023
MLS C1 W1 About the Learning rate course Supervised ML: Regression and Classification week-1	4	578	July 15, 2022
Week 2, Exercise 6 -- Error Improving Deep Neural Networks: Hyperparameter tun	2	570	February 3, 2022
C2_W2_Computation graph (Optional) Advanced Learning Algorithms week-2	5	515	March 16, 2023
Week 2. Why we multiplying by slope instead of dividing? Neural Networks and Deep Learning	4	515	May 14, 2023

Unable to understand Gradient descent intuition

Related topics