Hello!
I’m struggling to come to terms here, pun intended. Specifically, J(w,b).
For the Cost function, it’s defined as one thing which includes being multiplied by (1/2m):
For Gradient Descent, it’s defined as another thing, which instead is multiplied by (1/m):
As I’m typing this out I’m realizing that the latter is NOT actually J(w,b) but rather dJ(w,b)/dw. Now I presume that multiplying J(w,b) and d/dw somehow yields the gradient descent formula from the cost function. Admittedly, I’m not entirely sure what I’m saying and greatly appreciate anyone taking the time to help me understand. I also realize understanding this may be out of the scope of my math knowledge and it’s just something I’ll have to accept.