Thoughts on Learning Rate Derivate in Gradient Descent for Logistic Regression

Link: Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera

Professor Ng established that we can write the cost function as the average of the sum of the calculated losses for each example. He rewrote the cost function as depending on the loss function and defined loss functions for logistic regression and linear regression. I thought that to compute the learning derivative for gradient descent for logistic regression, we would calculate the derivative of the loss function, that is, the derivative of equation (2) in the attached diagram or would the derivative have the same effect as the cost function described?
Could someone help clarify this?

Both methods should give the same result.

Hello @tobiademola

I am not following because it’s difficult to understand how the effect of derivative and cost be “the same”. Derivative drives gradient descent. Cost tells the error. They have different uses.

However, I understand this one, and it is a YES.

I am not sure the following is relevant to your question, but I just feel like to write down:

Loss is for one sample. Cost is for all samples and is the sum average of all losses.
Derivative of the loss is for one sample. Derivative of the cost is for all samples and it is also the sum average of all derivatives of loss.

Cheers,
Raymond

1 Like

Right! The derivative of the sum is the sum of the derivatives. And the derivative of the average is the average of the derivatives. Because taking derivatives is a linear operation.

1 Like

Thanks Paul! It’s average and it’s not sum. Sometimes I write it that way and it’s bad. I should correct those in my post.