C2_W2_SoftMax Lab - question about SparseCategorialCrossentropy or CategoricalCrossEntropy

evoalg · July 30, 2022, 8:29am

In the C2_W2_SoftMax lab it says:

… and I thought I understood what it was saying, but in the lab we have (for the first two example vectors):

[[6.18e-03 1.51e-03 9.54e-01 3.84e-02]
 [9.93e-01 6.15e-03 3.59e-04 3.78e-04]]

… and that would seem to me that it is approximating the one-hot encoding:
[0, 0, 1, 0]
[1, 0, 0, 0]
and therefore CategoricalCrossEntropy should be used, but in the lab SparseCategorialCrossentropy is used instead, so I realized I don’t understand “SparseCategorialCrossentropy vs CategoricalCrossEntropy” at all.

If someone can explain the difference to me, especially how the output vectors relate to an example problem the that NN is trying to solve, that would be good. Examples would be appreciated and also maybe even a diagram?

SainiAnkit · July 30, 2022, 9:25am

The difference between categorical cross entropy and sparse categorical cross entropy is how you represent your labels in the dataset.
As shown in the screenshot attached, categorical cross entropy uses one hot encoded label while sparse categorical entropy uses the index value of the class as the label.
The loss computation gives you identical results in both cases. For example, the loss formula for a single training example is:
loss = -y_true * log(y_pred), where y_true is the ground truth label and y_pred is the predicted label.
Consider the first vector for loss computation using both the losses where y_true = [0, 0, 1, 0] and y_pred = [6.18e-03 1.51e-03 9.54e-01 3.84e-02]
In the case of categorical cross-entropy loss will be: -[0 * log(6.18e-03) + 0 * log(1.51e-03) + 1 * log(9.54e-01) + 0 * log(3.84e-02)] = - log(9.54e-01) .
In the case of sparse categorical cross-entropy loss will be just the logarithm of the output at the ground truth index, i.e. at index 2: loss = -log(9.54e-01).

evoalg · July 31, 2022, 7:17am

I finished week2 to see if it became clearer and I’m still not clear on a small point.

For a digit recognition NN (to recognize if an image is a 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9), we might want our final output layer to have 10 nodes where the highest node output (for a particular image training example) means the NN thinks that node index (producing the highest output) is the number the image is representing (so eg if the highest output node is from node 8, then the NN is saying the image is an ‘8’) and so in this case we would use SparseCategorialCrossentropy. Is that right?

Now if we wanted to use CategorialCrossentropy instead, would we have only one output node, where that output is actually a vector of length 10? (so if that one output node produced [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0] then since the 1 is in the 8th position, the NN is saying the image is an ‘8’).
That doesn’t seem right??

Please let me know where I’m wrong (and I’m probably not quite understanding a few things here!).

rmwkwok · July 31, 2022, 8:28am

Hello @evoalg, if you have ten classes, your output layer needs 10 nodes, regardless you are using CategorialCrossentropy or SparseCategorialCrossentropy.

If you choose to use SparseCategorialCrossentropy, tensorflow needs index labels. For a sample that is class 0, the label is 0.

If you chooes to use CategorialCrossentropy, tensorflow needs one-hot representation labels. For a sample that is class 0, the label is [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ]

Cheers,
Raymond

evoalg · July 31, 2022, 9:05am

Oh! … and here “labels” means what y (the ground truth) contains?

rmwkwok · July 31, 2022, 9:08am

label is the y_true.

evoalg · July 31, 2022, 9:09am

Thank you - I actually feel I understand it now!

Topic		Replies	Views
Categorical_crossentropy vs sparse categorical crossentropy Natural Language Processing in TensorFlow week-4	1	504	February 27, 2023
Can anyone explain? Introduction to TF for Artificial Intelligence ... week-3	3	573	February 11, 2022
Sparse_categorical_crossentropy v.s. categorical_crossentropy on C2W4 Convolutional Neural Networks in TensorFlow week-4	4	1497	February 28, 2022
SparseCategoricalCrossentropy vs. CategoricalCrossentropy Machine Learning Specialization	2	99	July 6, 2022
Error in Assignment C2_W4_Assignment Convolutional Neural Networks in TensorFlow week-3	3	570	December 1, 2022

C2_W2_SoftMax Lab - question about SparseCategorialCrossentropy or CategoricalCrossEntropy

Related topics