Unable to understand Gradient descent intuition

In this graph, we want to find the value of w, for which J(w) is the minimum.
Also, d(J(w))/d(w) - can also be stated as the rate of change in function J(w) w.r.t w (if I am not wrong)
and, the equation to update w, is
w = w - alpha * d(J(w))/d(w)

What I cannot understand is why are we using d(J(w))/d(w) to update w.

d(J(w))/d(w) tells us how function J(w) changes as w changes. So how can we use that term to update ‘w’.

1 Like

dJ/dw is the slope of the curve you drew.

  • If the slope is positive, we need to reduce w so we get closer to the minimum.
  • If the slope is negative, we need to increase w so we get closer to the minimum
2 Likes

I think @TMosh has made the point. Rather than the magnitude part which is modified by the learning rate, I think we are solely relying on the slope to tell us the sign for whether to increase or decrease the weight to get it closer to a cost minimum.

If we are worrying about w and dJ/dw being in different units, let’s not forget we still have the learning rate (“unit of w per unit of slope”) to get the unit back right.

Cheers,
Raymond

1 Like

Thank you @TMosh and @rmwkwok .

@TMosh Thanks for helping me understand the equation from a different context. Instead of thinking about how cost function changes wrt w, if I think of in which direction to move to get to a local minimum wrt w, the parameter update equation makes complete sense.

@rmwkwok You actually figured what my real contention was with- which was the unit. I didnt think about the learning rate (assuming it was just a constant). Thank you.

2 Likes

You are welcome, @Tom19!