Hi there,
in addition: dependent on the solver also „momentum“ next to other solver characteristics could also be an influencing factor in the course of gradient descent, (e.g. in case of Adam). See also: Why not always use Adam optimizer - #4 by Christian_Simonis
Best regards
Christian