I am struggling to understand the description (caption) of Fig. 3 of the exercise.

For example,

  1. I cannot distinguish the difference between red and blue lines.
  2. Then it is mentioned that "Rather than just following the gradient, the gradient is allowed to influence v What are those 2 gradients, how different they are?

These are some of the explanations that are not clear to me, but, honestly, I do not understand at all the description. I think I get the intuition that the path to the minimum should become less noisy (fewer oscillations) with momentum, but I cannot connect that with the given description.

Below is the image and the caption.

Caption - β€œThe red arrows show the direction taken by one step of mini-batch gradient descent with momentum. The blue points show the direction of the gradient (with respect to the current mini-batch) on each step. Rather than just following the gradient, the gradient is allowed to influence v and then take a step in the direction of v.”


Hi @henrikh,

I understand it this way, after each mini-batch, you compute the gradient and you get the blue line, but going in that direction may produce a big oscillation since there can be a big variability in the mini-batch gradients (the figure show that blue lines have abrupt changes in direction with respect to the previous red line).

To avoid abrupt changes in the direction of the update (oscillations), the method does not go directly in the blue line direction, but rather adjust its direction with the previous gradients directions, as a result, you get the red line, which is the direction of the actual update, and so on.

Thanks! I was getting intuitively that the blue lines are not yet corrected paths and reds are corrected but your explanation is much clear.

