SparseCategoricalCrossentropy vs. CategoricalCrossentropy

In Machine Learning Specialization taught by Ng. Adrew, most Neural network models for multiclass classification use SparseCategoricalCrossentropy. For example, in a neural network to recognize ten handwritten digits, 0-9, the code is as follows:

model.compile( loss=tf.keras.losses.SparseCategoricalCrossentropy,
optimizer = tf.keras.optimizers.Adam(0.001),

I was considering using CategoricalCrossentropy for this handwritten digit recognition example to see the differences in outcomes. However, there is a error that says ‘ValueError: Shapes (None, 1) and (None, 10) are incompatible’

So, is it possible to use the CategoricalCrossentropy loss function in this case? What are the differences between SparseCategoricalCrossentropy and CategoricalCrossentropy in TensorFlow?

Remark: CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].

1 Like

Hey @HongruNUS,
Welcome to the community. I guess to answer this question, we don’t need to go further than your own remark, i.e.,

This is pretty much the only difference between the 2 loss functions. In simple words, if you have y in terms of integers

y = [6, 3, 9]

(Considering we have 3 examples) then we use SparseCategoricalCrossentropy, and if you have y in terms of one-hot encoded labels

y = [
   [0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
   [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],

then we use CategoricalCrossentropy. You can easily convert your y from one representation to another and depending on your representation, you can use either of these loss functions.

So, in your example, since you are trying to use CategoricalCrossentropy, you must convert your integer labels into one-hot encoded labels first, so it expects the shape of the labels to be (number of examples, number of classes) and what you are passing is (number of examples, 1). I hope this helps.



Thanks for your reply. I appreciate it.