Bug in TensorFlow project

ahmedbakr · August 27, 2021, 7:46am

In Exercise 6 (compute_cost), the instruction is as follows:
It's important to note that the "y_pred" and "y_true" inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).
When I write the code as expected using tf.keras.losses.categorical_crossentropy, it fails the test, but when I tried the softmax function tf.nn.softmax_cross_entropy_with_logits, I got the correct answers.
Can you please tell me why this happens? or report/correct if it is a bug.

Thanks,

nramon · August 27, 2021, 9:00am

Hi, @ahmedbakr.

Make sure you’re passing the right parameters to categorical_crossentropy.

Let me know if you need more hints.

Good luck

ahmedbakr · August 27, 2021, 11:03am

I gave it two parameters in the following order:

transpose of the labels using tf.transpose
transpose of lotgits using tf.transpose
I achieve 0.8 as the output cost, but the correct one should be 0.4
Can you please give me some hints if I did something wrong?
Thanks in advance.

nramon · August 27, 2021, 1:00pm

Of course. Check the description of the from_logits parameter. What are you passing as y_pred?

ahmedbakr · August 27, 2021, 9:04pm

I checked the link and I am passing logits, as y_pred.

paulinpaloalto · August 27, 2021, 9:12pm

That’s correct, but if you are passing the logits value, then you also have to pass the correct value for the from_logits parameter. Did you read that section of the documentation as @nramon suggested? Using the default value for from_logits will not work.

ahmedbakr · August 27, 2021, 9:20pm

Yes, I have read it. Actually, I am passing the transposed value of logits

paulinpaloalto · August 27, 2021, 9:24pm

It sounds like we are talking at cross purposes here. Here’s the key point that seems like it’s not getting across: there are two ways to invoke either the binary cross entropy loss function or the categorical cross entropy loss function. You can pass the logits as the prediction input or you can pass the actual activation outputs (sigmoid or softmax) as the prediction inputs. But the algorithm can’t tell which you are doing, right? You have to tell it which form you are using as the predictions. You tell it that using the parameter from_logits, which is an optional “named” parameter with the default value of False. If you are passing logits, then you need to set the from_logits flag to True.

ahmedbakr · August 27, 2021, 9:36pm

Really thanks for your help. I understood from_logits in a totally different way. I got it when you explained it.

paulinpaloalto · August 27, 2021, 9:44pm

Great! You may well ask why they go to all that trouble. It turns out that computing the activation and the loss together in one step allows them to get better numerical stability and handle some outlier cases like NaN values from saturated sigmoid outputs more simply and cleanly. It’s also one less TF function call, so you’ll see that we always use the “from_logits = True” mode in these courses. If it’s one less call and it works better, what is not to like about that?

thearkamitra · August 28, 2021, 8:48am

The numerical stability is the reason for which activation and loss are calculated at the same time!
I used to think the softmax calculation and the loss at different steps was weird since you could not see the probability distribution of the classes was not visible. But it made sense once I read about the stability!
Also, if one wants to find the prob. dist., one can put a softmax on the logits of the model anytime!

paulinpaloalto · August 28, 2021, 2:40pm

Right! You need to have the logic to apply softmax to the logits when you’re using the model in “prediction” mode as opposed to “training” mode anyway.

Topic		Replies	Views
W-3-- Programming Assignment: TensorFlow Introduction Improving Deep Neural Networks: Hyperparameter tun	5	551	August 8, 2022
Math behind "tf.keras.metrics.categorical_crossentropy" Improving Deep Neural Networks: Hyperparameter tun	4	867	September 24, 2022
Problem with Course 2 Week 3 Assignment Improving Deep Neural Networks: Hyperparameter tun	6	757	February 22, 2023
DLS 2 week 3 exercise 6 compute_cost Improving Deep Neural Networks: Hyperparameter tun	41	4031	September 20, 2024
Week 3 - Exercise 6 - Compute Total Loss Improving Deep Neural Networks: Hyperparameter tun	8	1329	April 3, 2024

Bug in TensorFlow project

Related topics