I understand that because we need a convex for J(w,b), that’s why this equation with log is J(w,b) for logistic regression. And of course, it can be used to monitor the cost along the way. But I don’t see how it contributes to the logistic regression’s gradient descent. The gradient descent is very similar to linear regression’s, only f(x) is different.

Shouldn’t gradient descent for logistic regression somehow comes from logistic regression’s J(w,b)?

What’s the other usage of J(w,b) for logistic regression?

I have the same question!

The partial derivative for linear regression was calculated directly from the squared cost function. Why isn’t the cost function used to calculate the partial derivative in logistic regression then?!?

Hello @Jinyan_Liu , @Schwan_Ray ,

We use logistic regression’s cost function to derive logistic regression’s gradients as well. If you are looking for the steps to derive them, check out this post’s “Derivation steps 2: logistic regression”. You will see how it ends up the same look as the gradients for linear regression.

Cheers,

Raymond

The cost equation was used to compute the solution for the gradients.

The cost value itself isn’t used except for monitoring, as you mentioned. The key to making gradient descent work is the code that computes the gradients.

Thanks!

But in the steps, the loss function for both Linear and Logistic are just be called L. Although I don’t understand full of the steps, to me, L appeared and then disappeared. It looks like what actually in L does not matter for computing gradient descent?

How is that done?

True, you don’t really need the loss value itself during gradient descent. It is handy to monitor that the cost is decreasing, but it isn’t essential.

It’s an application of calculus, starting from the cost equation, computing the partial derivatives of the cost with respect to the weights and bias. This gives the equations for the gradients.

What I mean is: What’s actually in loss function L actually does not matter at all when you derive gradient descent, right?

Does Loss function equation matter in the process?

The cost function (also called the loss L) is necessary because that’s where the gradients come from.

Yes, it is how I understood it. It’s just when Loss and Cost function for Linear and Logistic Regression are so different, but their gradient descent are so similar make it very hard to understand what’s happening in the middle.

This is a happy circumstance, but it’s due to the presence of the sigmoid function in the predicted y-hat value for logistic regression, and how the logistic cost is defined, that makes it work out this way.

Thanks! I will just think it as somehow it ends in this way. But gradient descent do get calculated from the cost or loss functions.

Hi @Jinyan_Liu,

How do you think I get the underlined equation?

I used the Loss function for the Logistic Regression

Raymond

Hello @Jinyan_Liu,

You don’t need to compute the loss to do gradient descent. However, you need to compute a metric (which can just be the loss itself) over the training set and the test set to monitor for how they change over iterations. You will learn about it in Course 2 Week 3.

Cheers,

Raymond

Ah thank you! Great that you point it out for me! Thank you! I don’t have this math background, so couldn’t understand the steps all by myself!

Thank you so much! Now I understand! And it’s so interesting that the Linear and Logistic regression’s gradient descent are so similar after calculation! Thank you!