From my understanding, gradient descent is expected to follow a path perpendicular to the contour lines from the cost function. I understand that the learning rate will influence the path taken, e.g., a too high learning rate will lead to a zigzag path, whereas a small learning rate will lead to a smoother path towards the minimum of the cost function. But the vector from one step to the next should be perpendicular to contour lines. However, in the optional lab 4 on gradient descent, the visualization of the path gradient descent takes is not perpendicular to the contour lines:
Is the visualization function incorrect and gradient descent path is indeed perpendicular to contour lines, or if the visualization is correct, why is the path not perpendicular to contour lines?
Hi @ptschanz Great question!
In general, the path that gradient descent follows towards the minimum of the cost function is expected to be perpendicular to the contour lines of the cost function. This is because the gradient of the cost function at a particular point represents the direction of steepest descent, and the gradient is perpendicular to the contour lines at that point.
However, it’s important to note that the path that gradient descent follows may not always be exactly perpendicular to the contour lines, especially if the learning rate is not well-tuned. If the learning rate is too large, gradient descent may overshoot the minimum and oscillate around it, resulting in a zigzag path that is not necessarily perpendicular to the contour lines. On the other hand, if the learning rate is too small, gradient descent may take a very long time to converge, resulting in a smooth path that may not always be perpendicular to the contour lines.
As for the visualization you mentioned, it’s possible that the visualization function is not showing the path taken by gradient descent exactly as it occurred, but rather as a smoothed version of the path for the purpose of visualization. In any case, the key takeaway is that the path taken by gradient descent should generally be in the direction of steepest descent, which is perpendicular to the contour lines at each point.
in addition: dependent on the solver also „momentum“ next to other solver characteristics could also be an influencing factor in the course of gradient descent, (e.g. in case of Adam). See also: Why not always use Adam optimizer - #4 by Christian_Simonis
The graph is just a simplified sketch. Your intuition about the expected path is correct - or at least sufficiently correct for Week 1 of an introduction to Machine Learning.
In addition to what all mentors said the gradient descent is the basic of the converge algorithms to tune weights and also depend on the learning rate and if the graph of the cost with weight is elliptic not a circular or the gradient oscillate around the global minimum cost there are another optimization algorithm will be powerful like adam or momentum algorithm and if you want to learn more about these algorithm i advice you after completing this specialization …start with deep learning specialization will will learn you more and more about fantastic algorithms
Thank you very much for all your replies. I understood the part about overshooting, but I expected that when I decrease the learning rate, the path would more closely follow a path perpendicular to contour lines. But this was not the case when I run gradient descent with different learning rates and using the provided plotting function, so I started doubting my intuition.
In this case, I’ll settle for general intuition at this time, but will keep this in mind for later when I look more deeply into various optimizers and when taking the deep learning specialization.
If you use the zoomed-in version of the contour plot with the gradient descent path, then you can see the path is perpendicular to the contour lines. Not sure why the first plot (not zoomed in) doesn’t show the path being perpendicular to the contour lines.