Use of squared error with sigmoid and applying gradient descent

Here’s another thread from a while ago which discusses this and also shows a graph of what the loss surface looks like if you use MSE for logistic regression. Sometimes a picture gets the message across better than words. It would be worth reading the earlier replies on that thread as well.

1 Like