In gradient descent both for linear and logistic regression derivatives for w_j
and b
are different, when where is an x_j[i]
outside of the brackets for w_j
. So my question is why why do we have x_j[i]
multiplication for w_j
and do not have for b
?
That’s how the result becomes when you do the calculus for the partial derivative of the cost with respect to b.
Intuitively, if you look at f_wb = w*x + b
, you can see that ‘w’ is scaled by x, but ‘b’ is not. This is the basis for why dj_dw and dj_db have different forms.
1 Like
Thanks for your reply. Intuitively, I’ve also thought about the absence of x
in the b
part, however, after your reply I have confidence.