DLS Course2: Week 3 Exercise 6 (compute_total_loss method)

venkatesh4 · November 25, 2022, 5:53pm

Hello !

I have been having trouble understanding what I need to do for Exercise 6 of Week 3 (Computing the loss function method: compute_total_loss) using tensor flow. I thought I would need to simply invoke the cross entropy tf method, compute the reduce_sum and return the total loss. Below is the code fragment:

{moderator edit - solution code removed}

That didn’t seem to help. Reading thru’ the TF docs, it seemed like the crossentropy function expected the logits to be predictions. Upon inspecting the inputs to the compute_loss_function which seemed like plain numbers, I thought I could use the sigmoid or the softmax functions in TF and use the output of that as input to the crossentropy function (like below).

{moderator edit - solution code removed}

Neither passing in the softmax or the sigmoid_output in place of logits seemed to work. I also tried setting the property from_logits for crossentropy to false/true. But they didn’t seem to help either. I seem to be getting a total_loss of
tf.Tensor(0.88275003, shape=(), dtype=float32)

while the expected value is
tf.Tensor(0.810287, shape=(), dtype=float32)

I am kind of stuck and am not sure how to proceed. Any help would be appreciated.

paulinpaloalto · November 25, 2022, 7:43pm

The key point is the fact that the output of layer 3 is the “logits” and not the softmax output. So you need to include softmax in the results. There are (as you say) two ways to do that:

Manually add softmax.
Use from_logits = True to include the softmax in the loss calculation.

Option 2) is the preferred method because it’s less code for you to write and more numerically stable.

There is one other thing you didn’t mention here: you also need to transpose the labels and logits to get the correct answer. This was discussed in the instructions.

venkatesh4 · November 25, 2022, 8:42pm

Thank you so much for the response. I had totally missed the part that I needed to transpose both the labels and the logits. I followed the ideas you had provided and that did it ! Thank you so much!

Michael_Markoe · January 8, 2023, 5:10am

Thanks! @paulinpaloalto I was stuck on this one too.

Seems like a candidate for more than “1 line of code.” To be fair, it could be one line of code; it would just be a long line.

Also on my lab the documentation link points to tf.keras.metrics.categorical_crossentropy while the text says tf.keras.losses.categorical_crossentropy … not sure if that makes any difference though. Seems like metrics is actually the right one to use.

paulinpaloalto · January 8, 2023, 5:43pm

As to the number of code lines, that is a programming style point and those are always just suggestions. The grader never looks at your actual code: it just calls your functions and checks the output values. I totally agree that clarity and maintainability may well be better served by more rather than fewer lines, provided that they are appropriately “tasteful”.

On the question of which loss function to use, I think they are just two APIs for the same underlying function. The OOP is getting pretty thick here with more subclasses than you can shake a proverbial stick at. Here’s the docpage for the other one. There is one additional argument provided by the “losses” version, but it’s not anything we care about. Either one should work.

Dominic_Culver · January 31, 2023, 5:56pm

I tried exactly what you suggested here, and I did not get the correct answer.

In particular, I entered the following,

{moderator edit - solution code removed}

Am I misunderstanding something???

paulinpaloalto · January 31, 2023, 6:00pm

Where does it say to divide the result by 2?

Dominic_Culver · January 31, 2023, 6:01pm

I already figured it out…

suresh_selvaraj · February 6, 2023, 8:47pm

Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.

Here, divide the sum by the total number of samples is misleading

paulinpaloalto · February 6, 2023, 8:53pm

What they say is correct if you think carefully about what is being said. We only divide by the total number of samples at the end of one full pass of training (all the minibatches). But the function we are writing here is computing the cost for one minibatch, so we only take the sum. The higher level logic will compute the running sum across all the minibatches and then compute the average when it is finished with the pass. You can’t compute the average at the minibatch level, because the math doesn’t work if all the minibatches are not the same size. That will happen if the minibatch size does not evenly divide the total batch size. So you can’t get the overall average by taking the average of the averages.

If you were paying close attention, this is exactly how it worked when we first implemented minibatch gradient descent in the previous assignment (C2 W2 A1 Optimization). It’s the same here, but now we’re doing it in TF instead of straight numpy.

someone555777 · May 10, 2023, 7:12pm

and why should I know to that place about existing of tf.transpose() method? I didn’t see it in instruction in previous times.

paulinpaloalto · May 10, 2023, 7:21pm

Because the dimensions of the output of the forward propagation are features x samples and the TF loss function requires the orientation with samples as the first dimension. They do mention in the instructions that you need to be aware of that. It’s also never a bad idea to read the documentation for the TF functions they are advising you to use. This is the intro to TF so there is a lot to learn.

Here’s an earlier thread that discusses this same point.

someone555777 · May 10, 2023, 7:27pm

it would be nice to see information about tf.transpose() method in instructioin of this exercise

paulinpaloalto · May 10, 2023, 7:46pm

Sure, since this is the very first assignment using TF, it would be nice for them to mention it or show an example. But you know that you need a transpose operation and you’re using TF, so try googling “how do I transpose a tensor in TF”. We can thank google for search and for creating TF.

I’ll file an enhancement request asking that they should at least say something like “Hint: you will need the tf.transpose function for this purpose” in the section of the instructions where they mention the dimension issue.

jbloom · November 2, 2023, 6:21pm

I got stuck here too. Thanks for posting this.

For anyone else stuck: The first bullet in Exercise 6 is key: “[inputs of categorical_crossentropy] are expected to be of shape (number of examples, num_classes).”

For feedback: It helps some of the earlier code uses tf.transpose, but I agree that the rest of the assignment and other assignments are usually much more clear about what to do. Perhaps add to this bullet "you can use tf.transpose() if you need it)?

PS: great class, this is just a nit

Topic		Replies	Views
Tensorflow_introduction, compute_total_loss() Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1113	August 21, 2023
Course 2, Week 3, compute_total_loss(logits, labels) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	12	2877	November 5, 2023
Stuck on C2W3 Assignment: Cost Function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	23	1076	October 3, 2023
Please help with compute total loss! Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	752	February 27, 2023
Compute_total_loss_test ex 6 fails Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	5	265	March 5, 2024

DLS Course2: Week 3 Exercise 6 (compute_total_loss method)

Related topics