Has anyone actually done the maths to calculate the partial derivative of the cost function J(\vec w, b) with respect to w_j?.
I have gone through my calculations twice and I get the same result as Andrew in his Video “Gradient Descent Implementation” but with a minus sign in front of the result.
I taken screenshots of my calculations and I have pasted them here if you need to check them.
Its been many years now since I calculated partial derivatives so I may have made an error somewhere.
Hopefully someone with more recent experience can spot the error.
Apparently the non-linear log function in the cost equation is counteracted by the nonlinear exponential in the sigmoid function that is part of f(w, b).
That was just an intuition, and not a result of a mathematical deviation.
If the first derivative of the loss was not tend to zero when error tends to 0, then gradient descent would push it away which is not a favourable consequence.
By error, I meant this part of your work → . It was called error because it’s the difference between the truth and the prediction.
Doesn’t this argument sound reasonable to you? I am a Physics graduate, and we always like to discuss intuitive understanding of maths formula, though it’s not always easy to.
Cheers,
Raymond
P.S. you have presented mathematically the result of the loss being proportional to the error
The slides might not have (I havn’t checked all of them) specified the base for the log, but if we think backward from the final result of gradient descent of logistic regression, the base was e.
But this is a Machine Learning class, though I don’t know who can tell what Machine Learning nomenclature is regarding the use of log, but this Machine Learning Specialization uses base e.