Misunderstandings On The Analytical Equations of GD In Logistic Regression

Hi all,

Currently I’m in Gradient Descent Implementation video of Week 3.

When you updated the gradient descent algorithm with the logistic regression models of z, perhaps it is overriden that the cost function includes this big formula with ‘logs’ that is not shown inside the sum of the complete calculation.

Even if we talk about the g(z) formula, is this equivalent with the cost and loss log functions at the beginning? Had we said that somewhere in the course previously, or is there something I do not understand? Exponent expressions shouldn’t be equal to logarithmic loss functions and logarithmic cost functions.

Looking forward for your reply

Thanks in advance!

If you have a question about something in the video, please include the time mark.

The g(z) function only references the activation function, which in this assignment is the sigmoid.

g(z) is used to compute f_wb.

f_wb is used to compute the cost. This is shown at time 0:54 in that video.

At 1:49, Andrew shows that once your compute the partial derivatives of the cost equation, you get the equation for the gradients.

Gradient descent uses


At the time mark 1:43 when it calculates the partial derivative of the cost function (based on the gradient descent algorithm) there are no logarithms inside the sum.

I refer to the calculations of w,b. Why’s that?

Looking forward for your reply


Hello @Menelaos_Gkikas,

Essentially it is about how to derive the gradients.

l = -y\log{p} - (1-y)\log{(1-p)}
p = \frac{1}{1+\exp{(-z)}}
z = wx +b

because \frac{\partial{l}}{\partial{w}} = \frac{\partial{l}}{\partial{p}} \frac{\partial{p}}{\partial{z}} \frac{\partial{z}}{\partial{w}}, so we only need to derive each of the term separately,

\frac{\partial{l}}{\partial{p}} = -\frac{y}{p} + \frac{1-y}{1-p}
\frac{\partial{p}}{\partial{z}} = p(1-p)
\frac{\partial{z}}{\partial{w}} = x

The \log is gone when we differentiate the loss with respect to p. You can recover the formula in the slide BUT for one sample if you multiply the above 3 terms together, and replace p with f(x). You recover the full formula if you put the summation and 1/m back in.