Week 3 Exercise 6 - compute_total_loss. Why transpose?

Daniel19 · November 21, 2023, 2:08pm

Hello,
In the exercise guide it says : " It’s important to note that the "y_pred " and "y_true " inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes) "

We already have both logits and labels as the same shape (6, 2) = (num of classes, num of examples), so the main thing above is not that tf.keras.losses.categorical_crossentropy expected both to be of shape, but that it expected them to both be (num of examples ,num of classes).

Is there a particular reason for it? I mean, from what I understood from the video, the loss function needs to apply log operator element wise to the forward prop output(Yhat), and then apply element wise multiplication between that and the ground truth matrix(Y) ,so it seems to me that as long as they are the same shape, it should be fine, unless Im missing something here.
so my question is basically why they have to be (num of examples ,num of classes), and not the other way around?

I tried to read here tf.keras.metrics.categorical_crossentropy | TensorFlow v2.14.0 , see if there is anything specifying the reason for us using transpose there, but couldn’t understand or find any.

I hope I managed to explain myself somewhat okay. If someone can elaborate and maybe shed some light, correct me where Im wrong, it would be much appreciated.

edit :

looking at the picture below, Im pretty sure Im missing something with how the Loss is actually calculated. (I dont see it as yjlog(yjhat) in any of the cases).
Maybe its easier if someone can explain to me how the tf.keras.losses.categorical_crossentropy
calculate the numbers in the output vectors in the picture below in both cases, and how to see it as yjlog(yjhat) :

paulinpaloalto · November 21, 2023, 3:35pm

That is just the definition of the TF APIs: they expect “samples first” as the arrangement of the dimensions. Here’s a thread which discusses this whole question.

Daniel19 · November 21, 2023, 3:58pm

Thanks a lot.
One more thing I guess, for anyone who can help. I tried to make sense with the tf.keras.losses.categorical_crossentropy function and the way we saw on the softmax training video, about how to loss is calculated.

Here in the picture I have 2 samples, 3 classes, so the shape should fit to the TF function. I dont see how you get the numbers in the output. I thought the first number should be -np.log(0.2) that is 1.6094379124341003 and that the 2nd number should be -np.log(0.3), but it is not. Im for sure missing something with how loss is calculated or with what the TF function is doing.

edit: It actually works when I change to from_logits=False
I think I got it, after reading more about the from_logits=False / True parameter.

Basically When from_logits=False , it assumes that the input predictions (y_pred ) are already probabilities (like this example) , meaning they have been processed by a softmax or similar function and represent valid probability distributions.
You can close the post

Thank you

editing to add the picture where it does make sense with the loss from the video

paulinpaloalto · November 21, 2023, 4:17pm

Glad to hear that you figured out the from_logits issue. Here’s a thread that explains why it is done with the “True” setting in the assignment and “going forward” in the rest of the courses.

Topic		Replies	Views
Week 3 - Exercise 6 - Compute Total Loss Improving Deep Neural Networks: Hyperparameter tun coursera-platform	8	1347	April 3, 2024
Wk 3, Prog. exercise 6: do I have to reshape the "logits" and "labels"? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	555	October 11, 2022
ERROR: Did you get the reduce sum of your loss functions? Improving Deep Neural Networks: Hyperparameter tun week-3 , coursera-platform	1	32	September 25, 2024
DLS course2 week 3 exercise 6(compute total loss) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	529	February 8, 2023
Course 2 Week 3 assignment, Ex6 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	429	June 28, 2023

Week 3 Exercise 6 - compute_total_loss. Why transpose?

Related topics