Course 2, Week 3, compute_total_loss(logits, labels)

Chris_Morgan · March 20, 2023, 1:38am

Is the expected answer for computer_total_loss() correct? The text advises the dimensions be (6, num_examples) and yet the only way the calculation passes is by transposing labels and logits, making them (num_examples, 6) before passing them into the cross entropy call.

Are we supposed to realize this from the tf cross entropy docs or??

paulinpaloalto · March 20, 2023, 1:53am

Yes, the answer is correct. You are right that by the time you pass the values to the loss function, they need to have the “samples” as the first dimension, because that is what the TF/Keras functions expect. That is what they mean in the instructions.

Chris_Morgan · March 20, 2023, 4:49pm

This was probably the most frustrating / confusing exercise in the whole course due to the transpose. The necessity to transpose wasn’t, as far as I recall, covered in any other exercises in the course. Was it covered in the video lessons?

I only figured it out by looking through the forums and stumbling upon some example code you wrote where you had a tf.transpose for the inputs to the cross entropy call.

paulinpaloalto · March 20, 2023, 4:55pm

Well, this is your first encounter with TensorFlow, which is pretty deep waters. It just turns out that Prof Ng, for his own reasons, has chosen to use a representation for the “sample” data in which the first dimension is the “features” dimension and the second dimension is the “samples” dimension. He has consistently used that through Course 1 and Course 2 right up to exactly this point at which we are calling a TF loss function for the first time. It turns out that TF is a general platform, so it has to deal with lots more complex cases than the vector inputs to a Feed Forward Network, which is all we have seen up to this point. We are about to graduate to Convolutional Nets in Course 4 (Course 3 is not a programming course), in which the inputs are 3D image tensors. For those more complex inputs, everyone I’ve seen orients the data such that the first dimension is the “samples” dimension and that is how TF works because it needs to support the general case.

They do explicitly mention in the instructions for that section that you need to be cognizant of that. Perhaps they should have made that section a bit more verbose.

So you’re right that this has never been mentioned before, but it was never necessary until literally this moment. In terms of why Prof Ng chooses to do it this way, my guess is that when we’re writing the code for ourselves in python, the code is cleaner with the orientation that he has chosen up to this point. But we will soon graduate to doing things the way everyone does in “real” applications: using frameworks like TF or PyTorch to build things. Prof Ng does have an important pedagogical reason for showing us the details of how the algorithms work, because that gives us much better intuitions than we could get by starting with the frameworks and viewing all the algorithms as “black boxes”.

paulinpaloalto · March 20, 2023, 5:00pm

There’s another issue that they don’t really discuss in much detail here which is arguably quite a bit more subtle: that’s the fact that they did not include the activation function at the output layer in forward propagation. If you were searching the forums for threads about this section, you probably also found some discussing the from_logits issue, e.g. this one.

Jeanne_Lane · April 27, 2023, 5:31pm

Will we use this and batch normalization again in the rest of the Specialization?

paulinpaloalto · April 27, 2023, 5:48pm

Yes, from this point forward, we will always use from_logits = True mode when we use TF cross entropy loss functions and we will see Batch Normalization used as well, although not in every single case.

Mun_Chung_Wong · June 2, 2023, 1:55pm

[quote=“Chris_Morgan, post:3, topic:293215”]
transpose
[/quote]Man oh man this thread discussion on transpose made my day.
Echoing what others saying a tricky one and didn’t see that coming at all!!!

Thanks!!!

Norman · June 19, 2023, 7:51pm

It seems if compute_total_loss is called with result = target(tf.transpose(pred), minibatch) we can avoid adding additional transpose?

It will be great if the instructions can be updated to mention the matrix orientation issue, such as tf.keras.losses.categorical_crossentropy expects (num_examples, 6) instead of (6, num_examples)

paulinpaloalto · June 19, 2023, 7:59pm

They do mention that in the instructions. Here is exactly the wording they give:

It's important to note that the "y_pred" and "y_true" inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).

Well, that depends on the orientation of pred, right? If it is defined in the expected (correct) orientation and compute_total_cost expects it to be in the wrong orientation, then we would need a transpose. They could define it in the wrong orientation and then we wouldn’t.

If you look at the code, you can see that it is defined to be 6 x 2, so we don’t need the transpose since it is already in the shape of (num_classes, num_examples), which is what compute_total_cost expects. And in fact the code in the notebook does not transpose it, but does transpose minibatch. Your version of the code is the opposite of what is actually there in the notebook.

breus · June 29, 2023, 11:04am

I agree; it was quite confusing given that the tf.transpose method wasn’t covered or mentioned anywhere in the task descriptions nor in video lectures, it only appears in some code cells. I’ve only understood what’s wrong after reading the forum.

The mentioning of " * It's important to note that the "y_pred" and "y_true" inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes)." is appreciated, but given that every other method we need to use is explicitly mentioned in the notes, it’s a bit unexpected that one need to figure out this particular method differently.

paulinpaloalto · June 29, 2023, 3:43pm

Well, it’s not like the idea of “transpose” is new: we have used that in numpy all over the place in DLS C1 and C2 already. So you know the concept, now all you have to do is figure out how to implement that in TensorFlow. I can name that google search in one guess!

I grant you that they have gone out of their way to condition us to expect very solicitous hand-holding in the courses thus far, so perhaps this does constitute a legitimate violation of your expectations. But as we arrive in the land of TensorFlow the waters are pretty deep and perhaps they feel that it’s time to encourage us to develop our own “swimming muscles” a bit more!

But seriously, it will be necessary to spend some time with the TF documentation and tutorials and also the forums as we get to Course 4 and Course 5 and get more seriously into the use of TensorFlow. There are also lots of very helpful threads on the Forum, e.g. this one (which is not relevant here in DLS C2 W3, but will be very useful when you hit DLS C4 W1 A2).

Topic		Replies	Views
Week 3 Exercise 6 - compute_total_loss. Why transpose? Improving Deep Neural Networks: Hyperparameter tun	3	704	November 21, 2023
Course 2 week 3 assignment Improving Deep Neural Networks: Hyperparameter tun	5	674	March 2, 2023
Week 3 - Exercise 6 - Compute Total Loss Improving Deep Neural Networks: Hyperparameter tun	8	1299	April 3, 2024
Wk 3, Prog. exercise 6: do I have to reshape the "logits" and "labels"? Improving Deep Neural Networks: Hyperparameter tun	7	554	October 11, 2022
Course 2 Week 3 EX6 Improving Deep Neural Networks: Hyperparameter tun	2	427	December 12, 2023

Course 2, Week 3, compute_total_loss(logits, labels)

Related topics