I’d like to inquire if there’s an explanation for the derivative process of the cost function with respect to ‘b’ and ‘w’ used in the gradient descent algorithm. I want to understand why the result is similar to that of linear regression.

1 Like

The short form of the answer is that the magic happens because of the form of the partial derivative of sigmoid().

The since the logistic hypothesis includes sigmoid() - which uses exp() - and the cost function includes the natural log, a whole lot of factors in the partial derivatives cancel-out, and you end up with a very simple form for the gradients.