Week 3 - Assignment - compute_total_loss - try to set from_logits=False

When working on the compute_total_loss, I can get it right when I call:
tf.keras.metrics.categorical_crossentropy passing in from_logits=True

Just for curiosity I tried a different approach by calculating the softmax by myself and set from_logits=False when calling categorical_crossentropy:

{moderator edit - solution code removed}

I saw that my output closely matches the expected output but the test failed. Why is that?

3 Likes

It’s an interesting point and a good experiment to run! We are operating in floating point here, so there are literally 2^{32} or 2^{64} different numbers we can represent between -\infty and +\infty depending on whether we use 32 bit or 64 bit floats. That’s pretty pathetic compared to the abstract beauty of \mathbb{R}. When we operate in a finite space like that, we have to deal with the issue of “numerical stability”. There can be different ways to express the same computation that are equivalent mathematically, but have different behavior w.r.t. the propagation of rounding errors when you are operating in a finite representation space like any type of floating point. The reason that the from_logits = True mode is used is that it is more numerically stable. That means it gives results that are closer to the actual correct answers we would get if we could use \mathbb{R}. It’s also less code to write, so that’s the way Prof Ng will always do it when we’re using TF loss functions: the output layer will omit the activation and have the loss function compute both the activation (sigmoid or softmax) and the cross entropy loss as a unified computation.

BTW numerical stability may sound like a bunch of hand-waving, but it’s actually not. In the subfield of math called Numerical Analysis, there is a way to reason precisely about the error propagation properties of different computations.

They only show the expected value to 6 decimal places and your answer rounds to the same value, but notice that they use 10^{-7} as the error threshold in the test. Try it again with the from_logits = True mode and it must be the case that the answer differs from the False answer in the 7th decimal place. You can print your loss value with a higher resolution than the default 6 decimal places to confirm this theory:

print("total_loss = {:0.10f}".format(total_loss))

68 Likes

Thanks so much for your explanation!

2 Likes

Why isn’t categorical cross entropy working for me? When I calculate everything from scratch I get ~0.17 loss instead of ~0.81. I get the same result using categorical cross entropy with from_logits=True. Am I doing something wrong??

9 Likes

That probably means you forgot to transpose the labels and logits. Here’s a thread with a checklist of potential errors in this function.

16 Likes