In Course1 - Week3 assignment, at 5.2 section it calculates the accuracy using cross-entropy formula as: accuracy = np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100
Where did “np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)” part come from? This is just like cross-entropy loss formula, using the format “y*log(y_hat)”
I thought that accuracy for binary classification comes from TP, TN etc: (TP+TN)/(TP+TN+FP+FN), not by cross-entropy derivation. Does anybody know why the course uses this formula?
Okay then how accuracy is derived from that formula with the np.dot(Y, predict) and np.dot(1-Y, 1-predict)? Where did this come from? Does it exist anywhere in the theory/notes? I cannot find anything related to that
It’s simple: accuracy is the percentage of correct predictions on a given batch of inputs. So that formula has nothing to do with cross entropy loss, even though it may superficially resemble it (notice there are no logarithms there).
If Y is a vector of labels (0 or 1) and \hat{Y} are the predictions based on the output of the model (also 0 or 1), then think about what this dot product will give you:
Y \cdot \hat{Y^T}
If Y_i = 1 and \hat{Y}_i = 1, then the answer will be one, otherwise it will be zero, right? So the dot product adds those up and you get the number of predictions for cases in which the label is 1 that the prediction was correct (also 1), right?
Now apply that same reasoning to the other term ((1 - Y) \cdot (1 - \hat{Y}^T)) and it should all make sense.
The other formulas you show containing TP and TN and FP and FN are a different way of assessing the results of a model. They give you the “precision” and the “recall” and the “F score”. Here is the Wikipedia page about that. Accuracy is a much simpler and straightforward metric.