Why is it that in the Gradient Descent algorithm for logistic regression, when we compute the derivative of J(w,b), we use the derivative of “Squared Error Cost Function” that we use in the Linear Regression model.
In the Logistic Regression model we have a different Cost Function as shown in attached image. Why don’t we use derivative of that cost function to update the values of parameters w and b when running gradient descent.
It’s confusing that in Logistic Regression Model the Cost Function is different from the Cost Function for which derivative is taken while running Gradient Descent.
But here, f_{w,b}(x^{(i)}) is different from the “Squared Error Cost Function”.
In linear regression, f_{w,b}(x^{(i)}) = w.x + b but in logistic regression, f_{w,b}(x^{(i)}) = \displaystyle \frac {1}{1+e^{-(w.x + b)}}
In addition to Saif’s answer, another thing to note is that if you take the derivative of the Logistic regression cost function with respect to its parameters w and b, you will obtain derivatives that appear similar to the derivatives of the Squared Error cost function except the difference lies in the function f_{{w},b}({x}^{(i)}) used in each case, as Saif previously mentioned.
You are right, but in the earlier video where Andrew explains the Cost Function of Logistic Regression, he mentions that if we use Squared Error Cost Function in logistic regression, the resulting graph of J(w,b) vs w,b is a non-convex graph, due to which we can get stuck in some local minima and may not reach the ultimate minimum value of J(w,b), as shown in attached image.
You are right, but in the earlier video where Andrew explains the Cost Function of Logistic Regression, he mentions that if we use Squared Error Cost Function in logistic regression, the resulting graph of J(w,b) vs w,b is a non-convex graph, due to which we can get stuck in some local minima and may not reach the ultimate minimum value of J(w,b), as shown in attached image.
And I’m trying to understand the reason behind using Squared Error Cost Function while running gradient descent for Logistic Regression Model (as shown in my main question).
In this image, under the red marked area we can see the Cost Function that we use for Logistic Regression model, as the Squared Error Cost Function ends up having non-convex graphical property.
This means that we’ll run gradient descent on this Cost Function (under red marked area) to avoid being stuck in the local minima.
However, in the image we can see that we run gradient descent using the Squared Error Cost Function (under the blue marked area). This means that our graph of J(w,b) will be non-convex in nature and we might end up in local minima.
Why are we doing this? Instead of using derivative of Logistic Regression’s Cost Function, why are we using Squared Error Cost Function?
It seems to me your logic is that because the equations in the blue box appear the same as the derivatives for squared cost for linear regression, then it implies that those equations are not the derivatives for logistic cost for logistic regression.
That is not correct, because their derivatives do appear the same form, and you would have seen it if you had worked out the derivatives step by step yourself. Check out this post for the steps.