Why do we plot a cost function j(w,b) in linear regression to find best-fit values of w and b? If simply want to find the best-fit values of w and b for which the model fits the data can’t I just differentiate wrt w and b to find the global minima(depending on the range of data set included) at this point I will have desired w and b? Won’t it work this way? Why do I need to jot down another function J(w,b) for an already existing function/model f(x) = wx + b? Can anyone clarify?

Yes, you could. That’s called the “Normal equation” solution, and it works fine as long as the matrix of data examples isn’t too big (because the solution requires inverting and multiplying by the X matrix).

But that solution only works for the limited case of the linear regression cost function. No such solution exists for other cost functions. And it doesn’t work efficiently if the data set is large - in that case gradient descent is a better solution.