Logistic Regression Derivative of J(w,b)

Ammar_Jawed · May 15, 2023, 3:22am

Why is it that in the Gradient Descent algorithm for logistic regression, when we compute the derivative of J(w,b), we use the derivative of “Squared Error Cost Function” that we use in the Linear Regression model.

In the Logistic Regression model we have a different Cost Function as shown in attached image. Why don’t we use derivative of that cost function to update the values of parameters w and b when running gradient descent.

It’s confusing that in Logistic Regression Model the Cost Function is different from the Cost Function for which derivative is taken while running Gradient Descent.

Can anyone explain the intuition behind this?

saifkhanengr · May 15, 2023, 4:49am

But here, f_{w,b}(x^{(i)}) is different from the “Squared Error Cost Function”.
In linear regression, f_{w,b}(x^{(i)}) = w.x + b but in logistic regression, f_{w,b}(x^{(i)}) = \displaystyle \frac {1}{1+e^{-(w.x + b)}}

Mujassim_Jamal · May 15, 2023, 5:08am

In addition to Saif’s answer, another thing to note is that if you take the derivative of the Logistic regression cost function with respect to its parameters w and b, you will obtain derivatives that appear similar to the derivatives of the Squared Error cost function except the difference lies in the function f_{{w},b}({x}^{(i)}) used in each case, as Saif previously mentioned.

Ammar_Jawed · May 15, 2023, 3:12pm

You are right, but in the earlier video where Andrew explains the Cost Function of Logistic Regression, he mentions that if we use Squared Error Cost Function in logistic regression, the resulting graph of J(w,b) vs w,b is a non-convex graph, due to which we can get stuck in some local minima and may not reach the ultimate minimum value of J(w,b), as shown in attached image.

If that’s the case, why are we using Squared Error Cost Function while running Gradient Descent?

Ammar_Jawed · May 15, 2023, 3:14pm

You are right, but in the earlier video where Andrew explains the Cost Function of Logistic Regression, he mentions that if we use Squared Error Cost Function in logistic regression, the resulting graph of J(w,b) vs w,b is a non-convex graph, due to which we can get stuck in some local minima and may not reach the ultimate minimum value of J(w,b), as shown in attached image.

If that’s the case, why are we using Squared Error Cost Function while running Gradient Descent?

saifkhanengr · May 15, 2023, 3:34pm

But we are not using the “Squared Error Cost Function” in logistic regression. If f is different, it means whole formula is different.

TMosh · May 15, 2023, 3:46pm

That figure on the right (which says “non-convex”) is exactly why we don’t use the squared error cost function for logistic regression.

When we use the correct cost function for logistic regression, it is convex.

Ammar_Jawed · May 15, 2023, 7:01pm

Exactly that’s my point.

And I’m trying to understand the reason behind using Squared Error Cost Function while running gradient descent for Logistic Regression Model (as shown in my main question).

Lets see it again:

In this image, under the red marked area we can see the Cost Function that we use for Logistic Regression model, as the Squared Error Cost Function ends up having non-convex graphical property.

This means that we’ll run gradient descent on this Cost Function (under red marked area) to avoid being stuck in the local minima.

However, in the image we can see that we run gradient descent using the Squared Error Cost Function (under the blue marked area). This means that our graph of J(w,b) will be non-convex in nature and we might end up in local minima.

Why are we doing this? Instead of using derivative of Logistic Regression’s Cost Function, why are we using Squared Error Cost Function?

Ammar_Jawed · May 15, 2023, 7:07pm

The intuition behind the function remains the same even if f is different.

Since, f(x) = ŷ
=> J(w,b) = (1 / 2m) Σ (ŷ - y)**2

Which means that this is a squared error cost function, regardless of the value of f.

TMosh · May 15, 2023, 7:18pm

That’s incorrect. We don’t use the squared error cost function for logistic regression.

That section of the lecture is to explain WHY we don’t use the squared error for logistic regression.

rmwkwok · May 15, 2023, 8:07pm

Hi @Ammar_Jawed,

It seems to me your logic is that because the equations in the blue box appear the same as the derivatives for squared cost for linear regression, then it implies that those equations are not the derivatives for logistic cost for logistic regression.

That is not correct, because their derivatives do appear the same form, and you would have seen it if you had worked out the derivatives step by step yourself. Check out this post for the steps.

Raymond

Ammar_Jawed · May 16, 2023, 4:34pm

That’s interesting, thanks for sharing. It makes more sense to me now how gradient is running for logistic regression model.

rmwkwok · May 16, 2023, 8:25pm

Glad to hear that, @Ammar_Jawed !

Cheers,
Raymond

Topic		Replies	Views
Visualizing Squared Error Cost function for Logistic regression in 2D Supervised ML: Regression and Classification week-module-3	3	665	February 17, 2024
Logistic Regression: Difference between cost function & gradient descent Supervised ML: Regression and Classification week-module-3	5	604	August 8, 2022
Why is Squared Error Cost for Logistic Regression non-convex? Supervised ML: Regression and Classification week-module-3	1	616	July 31, 2022
Inconsistency in Logistic Regression Cost Function Supervised ML: Regression and Classification week-module-3	4	561	July 4, 2022
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-module-3	18	726	June 9, 2024

Logistic Regression Derivative of J(w,b)

Related topics