Purpose of cost function Course 1 Week 1 assignment

I have just finished the first assignment of the first week of the first course, but still had a question about implementation. In the gradientDescent function we calculate J and return it, however I do not see where this value is used. It is printed out whenever we train the model, however I thought that it was something we needed to calculate so that we could reduce it during training.

That’s an interesting point and a good observation! It turns out that the actual J value itself is only useful as a proxy for judging how the convergence of Gradient Descent is working. The things we really care about are the gradients, but those do not actually include the value of J itself: they do indirectly depend on it, but the actual values are the derivatives of J w.r.t. the parameters.

It turns out that the J value is not really meaningful for anything other than as a comparison to itself. The real metric of success is the prediction accuracy of the model, but that’s not convenient to use as a way to judge on an iteration by iteration basis whether convergence is working or not. So we watch J until it stops decreasing and then stop and take a look at the prediction accuracy and consider whether that is “good enough” for whatever our purpose is.