What do we mean by this statement?
The arrow sizes reflect the magnitude of the gradient at that point. The direction and slope of the arrow reflect the ratio of DJ/Dw and DJ/DB at that point. Note that the gradient points away from the minimum. Review equation (3) above. The scaled gradient is subtracted from the current value of 𝑤
w and b. This moves the parameter in a direction that will reduce cost.
Can anybody explain? I didn’t understand a single point from the above.