As explained in the video logistic regression cost function, it is mentioned that Using MSE as a Loss function for logistic regression makes it non-convex. Can someone prove it (mathematically and visually both) or develop an intuition of it?
Unconvinced explanations:
Statistical ML theory: I understand the statistical ML theory which says that loss function is NLL of the model and for best parameters, we need to minimize NLL loss and where we can prove that linear regression has gaussian distribution and Logistic regression has Bernoulli type of distribution thus NLL is MSE and Cross entropy respectively.
But still, it doesn’t answer the question of what makes MSE non-convex in logistic regression but not in linear regression
Penalization theory: There is one more theory, which is the so-called “Penalization Theory” It can also be said that Cross entropy penalizes by a very large amount (statistically infinite) when the prediction goes wrong as compared to MSE which penalizes by the maximum value of 1 for the wrong prediction and thus, makes the loss function range very large
All the above theories explained the rationale behind using Cross entropy in Logistic regression as opposed to MSE.
However, my question is WHAT EXACTLY MAKES MSE NON-CONVEX FOR LOGISTIC REGRESSION. I read it on the web, it is due to the non-linear nature of the sigmoid which makes the loss function non-convex. I am still not able to visualize or develop an intuition of that.
Neither I am able to link the above theories with my question.
Can someone please explain this to me?