I’d like to inquire if there’s an explanation for the derivative process of the cost function with respect to ‘b’ and ‘w’ used in the gradient descent algorithm. I want to understand why the result is similar to that of linear regression.
1 Like
Check this out, and let me know if it is not easy to follow.
Cheers,
Raymond
The short form of the answer is that the magic happens because of the form of the partial derivative of sigmoid().
The since the logistic hypothesis includes sigmoid() - which uses exp() - and the cost function includes the natural log, a whole lot of factors in the partial derivatives cancel-out, and you end up with a very simple form for the gradients.