I have been going through the train accuracy calculation in the last 3.3 section. The code is calculating is the training accuracy between the linear function (Z3) and a categorical variable Y_train. Will it be alright to compare these two variables to find the accuracy? Don’t we require to apply a softmax function before comparing for accuracy of the model, as we don’t require to calculate for computation for tf.keras.losses_crossentropy?
Hello Nijaj, welcome to the community!
You’re right! When calculating the training accuracy for a multi-class classification problem, you usually apply a softmax function to the output Z3 to get the predicted probabilities for each class before comparing them with the true labels (Y_train). The softmax function converts the logits (the unnormalized output of the linear function Z3) into probabilities, which allows you to select the class with the highest probability as the predicted class.
In TensorFlow, you can bypass the manual application of softmax when calculating accuracy if you’re using certain predefined functions. For example, while you would explicitly apply softmax when calculating loss (as with cross-entropy), TensorFlow’s built-in accuracy metric functions such as tf.keras.metrics.CategoricalAccuracy are smart enough to handle raw logits by internally applying the necessary transformations.
If you were to manually code the accuracy calculation without TensorFlow’s built-in metric function, you would need to apply Softmax. However, since you’re using the built-in metric, it works just fine without it.
I hope this helps!
Note that softmax is monotonic, so you can convert the logits output or the softmax output into categorical predictions by simply taking argmax of either input. The answer will be the same because of monotonicity: the maximum value gives the predicted category. That is probably what the CategoricalAccuracy metric logic is doing to handle both cases.
Great observation! This is why tf.keras.metrics.CategoricalAccuracy can operate directly on logits without explicitly applying softmax. The metric function uses argmax internally, effectively treating the logits as if they had already been processed by softmax. The code works because of this property, which allows TensorFlow to bypass the explicit softmax step for accuracy calculations.