Softmax Loss function

Abhi08 · May 6, 2021, 6:10pm

Hi,

I needed some help in understanding the loss function while using multiclass classifiers.

If the loss function when using multiclass classifier (using softmax) does not take into account the predictions for the non-true classes for the training example, how are the values in the non-true classes for the training example tuned to lower values ?

Thanks!

neurogeek · May 6, 2021, 7:18pm

Hi @Abhi08,

I’m not sure if I’m understanding the question, so bear with me and please feel free to respond, of course!

The training set is a vector of n classes with just a 1 in the class that represented in the example. Even though there is only one class in the example, the error will be ‘global’, in the sense that whatever loss is there, it will be backpropagated and it will affect all weights in all layers. This in turn will affect the values of the non-true classes which are the inputs to the softmax.

So, in all, is not that the non-true values are being tuned down, it’s just that loss+backprop is increasing the probability of the true class as part of the learning process.

Hope that helped.

neurogeek · May 6, 2021, 7:34pm

Also, I just noticed that you talk about softmax as a loss function, and that is not quite right.

Softmax is an activation function. It takes a vector with n classes in it and it transforms it into a probability distribution (sum of the output is 1) over the predicted output classes.

For TF/Keras, the loss function used in these cases is the categorical_crossentropy, which calculates the loss across the classes. You can read more about it here: tf.keras.losses.categorical_crossentropy | TensorFlow Core v2.4.1

Abhi08 · May 6, 2021, 10:33pm

Thank you for your reply.

Apologies for not being clear with the question.

I meant to ask how are the non-true values tuned to lower values.

For example, suppose we need to classify a dataset into 3 classes- cat, dog and none of the above (nota)

So we have 3 output cases, y= [1 0 0] (cat), [0 1 0] (dog) and [0 0 1] (nota)

If we have an output from a cat’s image as-
(case 1)
y_pred=[0.92 0.04 0.03] - This can be confidently classified as a cat

(case 2)
y_pred=[0.92 0.04 0.94] - This cannot be classified as a cat.

However, while training with cat images, since the value of the 3rd element in the output vector is not a part of the cost function. How do we make sure that the training data does not end up giving case 2 ?

In binary classification, it is avoided by including the non-true element in the cost function. Thus ensuring it has a low value.

Abhi08 · May 6, 2021, 10:38pm

Thanks! Yes, I did not consider the softmax as a loss function. I meant the loss function associated with a multiclass classifier.

paulinpaloalto · May 7, 2021, 3:45am

It’s a good question. The way to look at this is that the point is that all the outputs of softmax add up to 1, right? All we really need to care about is the value that gets assigned to the correct label. How the “error” (wrong) values get assigned or distributed among the possible wrong answers doesn’t really matter. Say the model is still pretty bad and only gives 0.4 for the correct label: we don’t really care where the rest of the 0.6 gets distributed. We just know it’s pretty bad that the correct answer only got 0.4. If you think about it, that’s what the loss function is doing.

To make this a bit more concrete, suppose that we have 4 classes: cat, dog, kangaroo and “none of the above”. If a given sample is labelled as a cat and the model predicts 0.4 for cat, then we don’t really care whether the other 3 values are (0.6, 0.0, 0.0) or (0.2, 0.2, 0.2) or (0.1, 0.1, 0.4) or some permutation of those values. Those are all equally bad answers from our point of view.

Abhi08 · May 7, 2021, 8:38am

Ok, Thank you for the reply !

Topic		Replies	Views
Multiclass Lab: How is softmax implied in the loss function? Advanced Learning Algorithms week-2	1	453	June 4, 2023
Why softmax in last layer for multiclass NN? Improving Deep Neural Networks: Hyperparameter tun	5	563	January 7, 2022
C2_W3_multiclassification Improving Deep Neural Networks: Hyperparameter tun	3	516	September 5, 2022
Softmax Loss Function for single example Advanced Learning Algorithms week-2	18	587	December 30, 2022
Softmax layer at last layer Neural Networks and Deep Learning	1	523	April 15, 2022

Softmax Loss function

Related topics