What does the cost function of logistic regression look like?

We use the standard “cross entropy” loss function for logistic regression and also for neural networks where the predictions are binary classifications (yes/no). The cost function is convex in the case of Logistic Regression, but it is not in the case of Neural Networks, because the cost function maps all the way from the input values to the final cost meaning that all the non-linear layers are included in the case of a Neural Network.

Here’s a thread which shows the graph of log(\hat{y}) which is the core of the cross entropy loss, if you just look at the function applied to the output, as opposed to the entire mapping from inputs to cost. You can clearly see that it is convex if you use the full -log(\hat{y}) function, which flips the graph shown about the x axis.

Here’s a thread which shows the graphs of the cross entropy cost surfaces versus an MSE cost function on a binary classifier. It’s not a mathematical proof, but a picture is worth a lot of words. :nerd_face:

Here’s a thread which discusses more about the non-convexity of NN based classifier cost function.