I’ve understood the need to find the correct values for w and b in order to minimize the cost of J. I just don’t understand why the update step for w and b are different. The only difference I noted was the multiplication to x(i) in the end, but I’d like to understand where that difference is coming from.
Perhaps a better question is: why do the functions to update w and b are different from each other?
Another angle to look at it is that, the model is its weights multiply with features plus a bias, but we can see the bias term as nothing but just another weight term which multiplies with a virtual feature that is always equal to 1.
b = w_{\text{bias term disguised as a weight}}x^{virtual},where x_{virtual} =1.
Since it is just another weight (disguised), we apply the top equation but setting x = 1, and we get the bottom equation.