Hi, I am a beginner in deep learning. I am currently at DLS Course 2. I took a non-linear operations research course and we used steepest descent algorithm for faster convergence. Here is the pseudo code for the steepest descent:

It is same as the basic gradient descent but instead of a constant learning rate (alpha), steepest descent finds the learning rate that makes the cost function minimum on the negative gradient direction in every iteration. With the optimum step size, the algorithm converges to the optimum way faster. I also implemented this algorithm to Course 1 Week 4’s assignment. Algorithm tries to find the best learning rate between 0 and 1 in every iteration. Since it makes learning rate optimization, iteration times are longer than basic descent but it converges in less iterations. For example in week 4’s assignment gradient descent converges to 0.08… cost in 2500 iterations in 13 minutes while it took only 350 iterations to reach 0.08… cost with steepest descent in 8 minutes.

I searched the web about steepest descent in neural networks but I couldn’t find an answer to my question. Why steepest descent is never used as an optimization algorithm in neural networks?