AFAIK, conjugate gradients may be much more efficient than the gradient method for optimizing functions with a big number of parameters. Is it used to train neural networks? I’d expect that it may help train faster
It’s used in some optimizers instead of the fixed-rate gradient descent method. Not just for NN’s.