Math behind "tf.keras.metrics.categorical_crossentropy"

sunson29 · September 23, 2022, 7:46pm

I accidentally delete my previous post, so I rewrite my question again. If you found the same post, feel free to delete it.

I did the math my way, but it doesn’t match with the final result. Can someone guide me here? Thank you.

Q1, in those (2,6) matrix, there 2 samples, and 6 classes. my understanding is right?

Q2, I think the cost is J = 1 / 2 ( - 1 log 6.148 - 1 log 5.033), but this is not equals to the solution 0.810287. I think I do not understand this part.

Q3. In the document of " tf.keras.metrics.categorical_crossentropy" , it says “from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.” what’s meaning of “encodes a probability distribution” ? Thanks.

paulinpaloalto · September 24, 2022, 3:56pm

The problem here is precisely that our “logits” input is not a probability distribution, by which they mean the output of the softmax activation function. Look at how we built the forward propagation in the earlier part of the exercise: there is no activation function at the output layer. That’s why you get values like 6.xxx and 5.xxxx which gives you crazy wrong values if you just apply the cross entropy loss to those values.

That’s why we need to tell the loss function that the inputs are “logits” and not a probability distribution. You use the from_logits argument to accomplish that. Prof Ng does not really discuss this as I recall, but the reason for doing it this way is that it is both more efficient and more numerically stable to allow the loss function to do both the softmax (or sigmoid in the binary case) and the log loss computation as a “bundled” operation. E.g. it’s easy to handle the “saturation” case in which the output rounds to exactly 0 or 1. That never happens mathematically, but in floating point it actually can and it makes a mess, since the loss ends up being NaN in that case. Once we switch to using TensorFlow, Prof Ng always does it this way, meaning we never include the activation function at the output layer in a classification problem.

sunson29 · September 24, 2022, 7:26pm

Hi Paul, after applying softmax, I hand calculated the loss result: loss = [0.25361034 0.5566767 ], which is exactly the same value from my code

“tf.keras.losses.categorical_crossentropy(tf.transpose(labels),tf.transpose(logits), from_logits=True)”,

However, in the picture below, you see the loss = [0.25361034 0.5566767 ], then cost = tf.reduce_sum(loss) = 0.25361034+0.5566767 = final result 0.810287. I thought I should do 1/m * sum (loss) = 1/ 2 * sum(loss). I don’t see 1/2 there.

In the tester function, I do see " assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7)", this tells me my code for loss is still wrong? Thank you.

paulinpaloalto · September 24, 2022, 8:11pm

They updated this notebook recently so that it manages the costs in the way needed for handling minibatches, as we did in the optimization assignment in Week 2. The compute_cost returns the sum of the cost across the given samples. You accumulate the total over all the minibatches and then you only divide by m at the end of the epoch. Check the logic in the model function that comes next to see what I mean.

So there must be something else wrong, although your code as shown looks right to me. I have not had time to update to the new notebook yet, so I can’t really help until I get back from vacation probably. At least at the rate that I’m going at this point …

sunson29 · September 24, 2022, 8:28pm

no worries. have a great vacation.

Abdel612 · June 5, 2025, 12:52pm

Guy,
You just saved my life by publishing your screenshot… I’ve spent something like 45mns trying to understand why my code is wrong.
I did zoom your screen capture, then I understood that I didn’t carefully read the instruction (matrix construction (no way to get rid of the compliance of this specialization, so no other clue ))

Topic		Replies	Views
Bug in TensorFlow project Improving Deep Neural Networks: Hyperparameter tun coursera-platform	11	628	August 28, 2021
Compute cost failed to pass Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	566	August 3, 2021
Week 3 - Assignment - compute_total_loss - try to set from_logits=False Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	17852	July 23, 2023
DLS 2 Week 3_Exercise_6_compute_cost()_ERROR Improving Deep Neural Networks: Hyperparameter tun coursera-platform	28	1670	August 28, 2024
Test does not match. Did you get the reduce sum of your cost functions? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	18	1051	November 15, 2022

Math behind "tf.keras.metrics.categorical_crossentropy"

Related topics