Hi,
I have a question regarding the definition of accuracy and cost function. I have seen in the assignments, usually the accuracy of the models are calculated. In the lectures, unfortunately, the concept of accuracy and what formula is used haven’t been discussed.
What is the meaning of accuracy and what formula you have used? And also, if in model “A” the cost function after “N” irritation gets lower value than model’s B, dose that mean the accuracy of model A is higher than model B?
Thanks so much.
Best regards,
Mohammad
Accuracy is simple to define: the fraction of correct predictions that the model makes on a given input (labeled) dataset, e.g. the training set or the test set. This can be expressed as a percentage or a decimal number between 0 and 1. To calculate it, you simply apply forward propagation to the dataset (without any regularization) and then compare the outputs to the labels. In the assignments they usually have us build a function called predict, which takes the model parameters and a dataset and produces the predictions, as in:
Y_pred = predict(parameters, X_train)
Then you can calculate the accuracy like this:
m = Y_train.shape[1]
acc = np.sum(Y_train == Y_pred) / m
Then the next question is how cost relates to accuracy. There is only an indirect relationship between cost and accuracy. The easiest way to see this is to consider one sample which has a “true” label. Suppose you train your model for 1000 iterations and the model then gives a correct prediction of 0.55 (the sigmoid output or \hat{y} value) which maps to “true” since it is > 0.5. Now you train for another 1000 iterations and the new model gives a prediction of 0.75 for that sample. The cost will be lower with the second model, but the accuracy will be the same at least with respect to that one sample. The point is that accuracy is “quantized”. A lower cost does not imply higher accuracy. For just a single sample, you could prove that lower cost implies no worse accuracy.
Of course the cost J is the average of the individual loss values across all the samples in the batch and more training can make the predictions on some samples better and some samples worse, which also muddies the relationship between cost and accuracy. Generally speaking if the J value decreases, it likely means that that accuracy is at least not worse, but that is not a mathematically provable relationship just a “rule of thumb”.
It turns out that actual scalar J cost value is not really useful for that much other than as an inexpensive way to monitor the convergence of your training. But the cost function is absolutely critical, since it is the gradients of that function which drive the back propagation towards better solutions. It’s just that the scalar J value is not that meaningful just by itself. You can also calculate accuracy periodically during your training and use that to assess how the training is progressing.
1 Like