Week 3 - Assignment - compute_total_loss - try to set from_logits=False

Patrick_Ng · December 16, 2022, 9:26am

When working on the compute_total_loss, I can get it right when I call:
tf.keras.metrics.categorical_crossentropy passing in from_logits=True

Just for curiosity I tried a different approach by calculating the softmax by myself and set from_logits=False when calling categorical_crossentropy:

{moderator edit - solution code removed}

I saw that my output closely matches the expected output but the test failed. Why is that?

paulinpaloalto · December 16, 2022, 5:17pm

It’s an interesting point and a good experiment to run! We are operating in floating point here, so there are literally 2^{32} or 2^{64} different numbers we can represent between -\infty and +\infty depending on whether we use 32 bit or 64 bit floats. That’s pretty pathetic compared to the abstract beauty of \mathbb{R}. When we operate in a finite space like that, we have to deal with the issue of “numerical stability”. There can be different ways to express the same computation that are equivalent mathematically, but have different behavior w.r.t. the propagation of rounding errors when you are operating in a finite representation space like any type of floating point. The reason that the from_logits = True mode is used is that it is more numerically stable. That means it gives results that are closer to the actual correct answers we would get if we could use \mathbb{R}. It’s also less code to write, so that’s the way Prof Ng will always do it when we’re using TF loss functions: the output layer will omit the activation and have the loss function compute both the activation (sigmoid or softmax) and the cross entropy loss as a unified computation.

BTW numerical stability may sound like a bunch of hand-waving, but it’s actually not. In the subfield of math called Numerical Analysis, there is a way to reason precisely about the error propagation properties of different computations.

They only show the expected value to 6 decimal places and your answer rounds to the same value, but notice that they use 10^{-7} as the error threshold in the test. Try it again with the from_logits = True mode and it must be the case that the answer differs from the False answer in the 7th decimal place. You can print your loss value with a higher resolution than the default 6 decimal places to confirm this theory:

print("total_loss = {:0.10f}".format(total_loss))

Patrick_Ng · December 19, 2022, 1:56am

Thanks so much for your explanation!

william27 · July 23, 2023, 7:53pm

Why isn’t categorical cross entropy working for me? When I calculate everything from scratch I get ~0.17 loss instead of ~0.81. I get the same result using categorical cross entropy with from_logits=True. Am I doing something wrong??

paulinpaloalto · July 23, 2023, 8:00pm

That probably means you forgot to transpose the labels and logits. Here’s a thread with a checklist of potential errors in this function.

Topic		Replies	Views
Problem with Course 2 Week 3 Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	761	February 22, 2023
I am confused, can you help me? Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	5	75	August 21, 2024
DLS Course2: Week 3 Exercise 6 (compute_total_loss method) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	15	1861	July 31, 2024
C2W3 - TF Programming Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	488	October 3, 2023
Math behind "tf.keras.metrics.categorical_crossentropy" Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	910	June 5, 2025

Week 3 - Assignment - compute_total_loss - try to set from_logits=False

Related topics