Multiclass Lab: How is softmax implied in the loss function?

Multiclass Lab says:

This is done by the implied softmax function that is part of the loss function (SparseCategoricalCrossEntropy ). Unlike other activation functions, the softmax works across all the outputs.

Hi @Ankur_Agarwal1 great question!

In a multiclass classification problem, we typically use a softmax function as part of our neural network model. What softmax does is, it takes the raw outputs of the model, known as logits, and transforms them into probabilities.

Now, we want to see how well our model is doing. That’s where the loss function comes in, and in this context, we often use the ‘SparseCategoricalCrossEntropy’ loss function. This function does two things: it applies the softmax function to the logits, and then it computes the categorical cross entropy. So, when we talk about softmax being ‘implied’ in the loss function, it means this function is already doing the softmax operation for us. We don’t need to explicitly apply softmax in our model if we’re using this loss function.

Of course, if you want to include the softmax function in your model, you totally can. In that case, you’d use a different loss function, ‘CategoricalCrossentropy’, which doesn’t have a built-in softmax.

The choice between these two approaches depends on what you’re trying to achieve. If you want to view your model’s outputs as probabilities, then incorporating softmax in your model might be useful. However, if your goal is to maintain numerical stability and efficiency, you might prefer to let the loss function handle the softmax. Either way works, it’s just a matter of what works best for your specific needs.

I hope this helps!