In week 2 Logistic Regression cost function video,
I was told that we should not use the MSE as the loss function, as it is not convex and many local optima. and Andrew drew the “potential” curve out.
Please show us the actual equation/curve looks like to convince us it is not convex and many local optima.

I understand that using standard cost function, it is convex.
just not convince without seeing the actual curve/equation.

We use the standard “cross entropy” loss function for logistic regression and also for neural networks where the predictions are binary classifications (yes/no). The cost function is convex in the case of Logistic Regression, but it is not in the case of Neural Networks, because the cost function maps all the way from the input values to the final cost meaning that all the non-linear layers are included in the case of a Neural Network.

Here’s a thread which shows the graph of log(\hat{y}) which is the core of the cross entropy loss, if you just look at the function applied to the output, as opposed to the entire mapping from inputs to cost. You can clearly see that it is convex if you use the full -log(\hat{y}) function, which flips the graph shown about the x axis.

Here’s a thread which shows the graphs of the cross entropy cost surfaces versus an MSE cost function on a binary classifier. It’s not a mathematical proof, but a picture is worth a lot of words.

Here’s a thread which discusses more about the non-convexity of NN based classifier cost function.