In any of the sklearn packages where you fit the X_train and y_train data to the model, does it compute the gradient, use the cost function, then use gradient and cost function to compute gradient descent to find the local minimum?

Not everyone.

I believe this one should work in that way as you can tell by its name.

Cheers,

Raymond

that makes sense, thank you so much!