Maximum Likelihood vs. Gradient Descent for Generalized Linear Model

GLM (Generalized Linear Model) is typically optimized through maximum likelihood algorithm in the implementations from many packages.

Does gradient descent have advantages over maximum likelihood?
One potential shortcoming for gradient descent is that it might end up with local minimum, but not global minimum.

I am interested in your thoughts around which algorithm is the better one to use for GLM.

Gradient descent is more computationally efficient if the data set is large (either in the number of features or the number of examples, or both).

Local minima is not a unique problem to gradient descent. GLM will also have problems (no unique solution) if the cost function is not convex.