Week 3 Exercise 6 - compute_total_loss. Why transpose?

Hello,
In the exercise guide it says : " It’s important to note that the "y_pred " and "y_true " inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes) "

We already have both logits and labels as the same shape (6, 2) = (num of classes, num of examples), so the main thing above is not that tf.keras.losses.categorical_crossentropy expected both to be of shape, but that it expected them to both be (num of examples ,num of classes).

Is there a particular reason for it? I mean, from what I understood from the video, the loss function needs to apply log operator element wise to the forward prop output(Yhat), and then apply element wise multiplication between that and the ground truth matrix(Y) ,so it seems to me that as long as they are the same shape, it should be fine, unless Im missing something here.
so my question is basically why they have to be (num of examples ,num of classes), and not the other way around?

I tried to read here tf.keras.metrics.categorical_crossentropy  |  TensorFlow v2.14.0 , see if there is anything specifying the reason for us using transpose there, but couldn’t understand or find any.

I hope I managed to explain myself somewhat okay. If someone can elaborate and maybe shed some light, correct me where Im wrong, it would be much appreciated.

edit :

looking at the picture below, Im pretty sure Im missing something with how the Loss is actually calculated. (I dont see it as yjlog(yjhat) in any of the cases).
Maybe its easier if someone can explain to me how the tf.keras.losses.categorical_crossentropy
calculate the numbers in the output vectors in the picture below in both cases, and how to see it as yj
log(yjhat) :

That is just the definition of the TF APIs: they expect “samples first” as the arrangement of the dimensions. Here’s a thread which discusses this whole question.

1 Like

Thanks a lot.
One more thing I guess, for anyone who can help. I tried to make sense with the tf.keras.losses.categorical_crossentropy function and the way we saw on the softmax training video, about how to loss is calculated.

Here in the picture I have 2 samples, 3 classes, so the shape should fit to the TF function. I dont see how you get the numbers in the output. I thought the first number should be -np.log(0.2) that is 1.6094379124341003 and that the 2nd number should be -np.log(0.3), but it is not. Im for sure missing something with how loss is calculated or with what the TF function is doing.

edit: It actually works when I change to from_logits=False
I think I got it, after reading more about the from_logits=False / True parameter.

Basically When from_logits=False , it assumes that the input predictions (y_pred ) are already probabilities (like this example) , meaning they have been processed by a softmax or similar function and represent valid probability distributions.
You can close the post

Thank you

editing to add the picture where it does make sense with the loss from the video

Glad to hear that you figured out the from_logits issue. Here’s a thread that explains why it is done with the “True” setting in the assignment and “going forward” in the rest of the courses.

1 Like