Hello,

In the exercise guide it says : *" It’s important to note that the " y_pred " and "y_true " inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes) "*

We already have both logits and labels as the same shape (6, 2) = (num of classes, num of examples), so the main thing above is not that tf.keras.losses.categorical_crossentropy expected both to be of shape, but that it expected them to both be (num of examples ,num of classes).

Is there a particular reason for it? I mean, from what I understood from the video, the loss function needs to apply log operator element wise to the forward prop output(Yhat), and then apply element wise multiplication between that and the ground truth matrix(Y) ,so it seems to me that as long as they are the same shape, it should be fine, unless Im missing something here.

so my question is basically why they have to be (num of examples ,num of classes), and not the other way around?

I tried to read here tf.keras.metrics.categorical_crossentropy | TensorFlow v2.14.0 , see if there is anything specifying the reason for us using transpose there, but couldn’t understand or find any.

I hope I managed to explain myself somewhat okay. If someone can elaborate and maybe shed some light, correct me where Im wrong, it would be much appreciated.

edit :

looking at the picture below, Im pretty sure Im missing something with how the Loss is actually calculated. (I dont see it as yj*log(yjhat) in any of the cases).
Maybe its easier if someone can explain to me how the tf.keras.losses.categorical_crossentropy
calculate the numbers in the output vectors in the picture below in both cases, and how to see it as yj*log(yjhat) :