Optional lab: Gradient descent with and without using scikit-learn

Why are the parameters obtained using scikit-learn different from those obtained using the gradient_descent function provided in the lab?

1 Like

Hello @Cristhian_David_Pere

The parameters are different in the first place. Also, sklearn’s LogisticRegression isn’t using gradient descent. The best we can hope for is, given the parameters are consistent, both approaches will be very similar instead of the same until the last digit.


1 Like

As you can see from the cost history using the gradient descent method, the cost has not yet reached its minimum - the cost is still decreasing. This means the solution using gradient descent has not yet converged.

sklearn is using the lbfgs minimizer - not gradient descent. This is a more advanced optimizer that runs more efficiently than gradient descent. It finds a converged solution using fewer iterations than GD.

You can learn about the lbfgs minimizer here:


Is it safe to assume that sklearn.LogisticRegression is using something other than the sigmoid function to fit the learning data ?
If it is using faster algorithm/solver to converge using same setup, I would assume the weights/bias will be similar.
I also noticed LogisticRegression isn’t asking for the input cost function, which leads me to believe it maybe solving using a different “setup”

I am new to this and the documentation on sklearn’s LogisticRegression isn’t that easy to digest. therefore, any insight you can shed is greatly appreciated.


1 Like

The standard for logistic regression is a linear model with sigmoid() applied to the output. I’m certain that is what sklearn uses also.

The cost function is built-in, since that determines the gradients, which are also built-in.

You can pick among several different optimization methods. They should all give very close to the same weights, just with different tradeoffs between processor load, memory usage, and speed of convergence.