I am struggling to understand the description (caption) of Fig. 3 of the exercise.
- I cannot distinguish the difference between red and blue lines.
- Then it is mentioned that "Rather than just following the gradient, the gradient is allowed to influence v What are those 2 gradients, how different they are?
These are some of the explanations that are not clear to me, but, honestly, I do not understand at all the description. I think I get the intuition that the path to the minimum should become less noisy (fewer oscillations) with momentum, but I cannot connect that with the given description.
Below is the image and the caption.
Caption - “The red arrows show the direction taken by one step of mini-batch gradient descent with momentum. The blue points show the direction of the gradient (with respect to the current mini-batch) on each step. Rather than just following the gradient, the gradient is allowed to influence v and then take a step in the direction of v.”