Any one who can help me with that how squared error as cost function does not work well with gradient descent in sigmoid function or logistic regression but the cross entropy does work well. thanks refernce: week 2 lecture title logistic regression cost function

Hi, @muneer321 , welcome to the community! While the squared error cost function works well for linear regression, it poses problems when used with the sigmoid function in logistic regression. In logistic regression and neural networks, using the squared error as the cost function with a sigmoid act…

The cost functions are different because the goals are different. Linear regression tries to make a model that fits the examples. Logistic regression tries to create a boundary between the “true” and “false” examples.

it helped however i did not understood how it works mathematically i mean when differentiating squared error which include the sigmoid function in y^ how the cost function leads to non convex function

Mathematically, a function is convex if its 2nd partial derivative is always positive or zero.

Certainly! The squared error cost function combined with the sigmoid activation in logistic regression results in a non-convex cost function because the second derivative (Hessian) is not always positive semidefinite, it can take negative values depending on the input data and parameters.

Use of squared error with sigmoid and applying gradient descent

Course Q&A Deep Learning Specialization Neural Networks and Deep Learning

paulinpaloalto September 29, 2024, 2:57pm 4

Here’s another thread from a while ago which discusses this and also shows a graph of what the loss surface looks like if you use MSE for logistic regression. Sometimes a picture gets the message across better than words. It would be worth reading the earlier replies on that thread as well.

Topic		Replies	Views
Visualizing Squared Error Cost function for Logistic regression in 2D Supervised ML: Regression and Classification week-module-3	3	889	February 17, 2024
Logistic Regression Derivative of J(w,b) Supervised ML: Regression and Classification week-module-3	12	1222	May 16, 2023
Logistic Regression: Difference between cost function & gradient descent Supervised ML: Regression and Classification week-module-3	5	636	August 8, 2022
Why is Squared Error Cost for Logistic Regression non-convex? Supervised ML: Regression and Classification week-module-3	1	635	July 31, 2022
Week 3: Gradient Descent Implementation Supervised ML: Regression and Classification week-module-3 , coursera-platform	4	64	March 28, 2026

Use of squared error with sigmoid and applying gradient descent

Related topics