Why MSE is non-convex for Logistic regression

mayankb2103 · May 26, 2021, 12:58pm

As explained in the video logistic regression cost function, it is mentioned that Using MSE as a Loss function for logistic regression makes it non-convex. Can someone prove it (mathematically and visually both) or develop an intuition of it?

Unconvinced explanations:

Statistical ML theory: I understand the statistical ML theory which says that loss function is NLL of the model and for best parameters, we need to minimize NLL loss and where we can prove that linear regression has gaussian distribution and Logistic regression has Bernoulli type of distribution thus NLL is MSE and Cross entropy respectively.
But still, it doesn’t answer the question of what makes MSE non-convex in logistic regression but not in linear regression

Penalization theory: There is one more theory, which is the so-called “Penalization Theory” It can also be said that Cross entropy penalizes by a very large amount (statistically infinite) when the prediction goes wrong as compared to MSE which penalizes by the maximum value of 1 for the wrong prediction and thus, makes the loss function range very large

All the above theories explained the rationale behind using Cross entropy in Logistic regression as opposed to MSE.
However, my question is WHAT EXACTLY MAKES MSE NON-CONVEX FOR LOGISTIC REGRESSION. I read it on the web, it is due to the non-linear nature of the sigmoid which makes the loss function non-convex. I am still not able to visualize or develop an intuition of that.

Neither I am able to link the above theories with my question.

Can someone please explain this to me?

mayankb2103 · May 26, 2021, 1:03pm

When I seached the web, I found this link: Squared Error vs Log Loss of Sigmoid

Okay this helps me clearly visually see that MSE is non-convex. But in the video, (Logistic regression cost function) mentioned by AndrewNg, he said problem is multiple local optima? HOW?

kenb · May 26, 2021, 2:34pm

Hello @mayankb2103 and welcome to the DL specialization. MSE loss is the natural choice for linear regression. Minimizing the average MSE loss (ordinary least squares estimator) is a nice linear-quadratic problem guaranteeing a unique (i.e. global) minimum. (It also coincides with the ML estimator if you assume Gaussian errors.) This is the first slide of the “Gradient Descent” lecture from week 2. Nice pictures there.

Consider the least squares loss function:

MSE_loss

To prove nonconvexity of the MSE loss function with the logistic model, I would substitute

into the quadratic loss function (above) and differentiate with respect to z and set it equal to zero. All you have to do is show that there is at least one other solution to that equation (checking that it’s a local min and not a local max). I have not done this; I hope you do! That said, I am guessing that the affine form of z has no bearing on the proof. It might, in which case you would need to differentiate with respect to w and b and solve the higher dimensional system for zero (the zero vector). Ouch.

JBARI_Hamza · December 26, 2021, 9:14pm

téléchargement

Topic		Replies	Views
What does the cost function of logistic regression look like? Neural Networks and Deep Learning	2	424	January 4, 2024
Why MSE is not a good loss function for logistic regression? Supervised ML: Regression and Classification week-3	2	1656	January 11, 2023
Nonconvexity -logistic Supervised ML: Regression and Classification week-3	3	439	November 5, 2023
Confusion between two Loss functions Neural Networks and Deep Learning	4	531	July 18, 2023
Mse Cost function Neural Networks and Deep Learning	6	728	June 6, 2024

Why MSE is non-convex for Logistic regression

Related topics