DLS Course 2 W2 - Why Steepest Descent is never used as optimization algorithm?

Kaan_Aslan · July 26, 2022, 6:35pm

Hi, I am a beginner in deep learning. I am currently at DLS Course 2. I took a non-linear operations research course and we used steepest descent algorithm for faster convergence. Here is the pseudo code for the steepest descent:

It is same as the basic gradient descent but instead of a constant learning rate (alpha), steepest descent finds the learning rate that makes the cost function minimum on the negative gradient direction in every iteration. With the optimum step size, the algorithm converges to the optimum way faster. I also implemented this algorithm to Course 1 Week 4’s assignment. Algorithm tries to find the best learning rate between 0 and 1 in every iteration. Since it makes learning rate optimization, iteration times are longer than basic descent but it converges in less iterations. For example in week 4’s assignment gradient descent converges to 0.08… cost in 2500 iterations in 13 minutes while it took only 350 iterations to reach 0.08… cost with steepest descent in 8 minutes.

I searched the web about steepest descent in neural networks but I couldn’t find an answer to my question. Why steepest descent is never used as an optimization algorithm in neural networks?

anon57530071 · July 27, 2022, 3:52am

Welcome to the community.

As you know, “steepest Descent” is part of “gradient Descent” family, which has common problems for non-convex optimization.

Source : Paradyumna Yadav, The journey of Gradient Descent — From Local to Global | by Pradyumna Yadav | Analytics Vidhya | Medium

The problems are;

We may fall into a local minimum, not a global minimum.
At the Saddle point, the gradient becomes 0. So, we can not exit from there.

To overcome the above situations, several optimization algorithms have been published.
Neural network frameworks, like Tensorflow, provide those optimization algorithms for your use. Faster convergence is one factor, but if it does not reach to our desirable condition, then, it may not be selected for a practical use. Then, advanced optimization algorithms like Adam, SGD, etc, become more popular for non-Convex optimizations.

Topic		Replies	Views
Wanted Feedback on Possible Optimization Opportunity with Regards to Gradient Descent Algorithm AI Discussions ai-discussions , coursera-platform	4	20	June 18, 2025
Local optima in gradient descent Neural Networks and Deep Learning coursera-platform	2	646	March 13, 2022
C1_W1_Learning rate Advanced Learning Algorithms week-module-1	3	332	December 21, 2023
Why is the number of iterations in gradient descent specified? Supervised ML: Regression and Classification week-module-1	4	523	March 20, 2023
Cost function convex why gradient decent Supervised ML: Regression and Classification week-module-1	5	642	June 13, 2023

DLS Course 2 W2 - Why Steepest Descent is never used as optimization algorithm?

Related topics