Hi. As many others seem to have been, I’m stumped on implementing compute_total_loss. Yes, I have made sure I’m addressing all four “common issues” listed in this thread. What I have noticed is:

My calculated cost is 0.4051435, which is precisely half of the expected value.

The output of calling CategoricalCrossentropy seems to be a single number, BEFORE using reduce_sum. In fact, it’s already 0.4051435, and reduce_sum leaves it unchanged.

Why would CategoricalCrossentropy return a number instead of a tensor that must be summed over?

I tried adding an axis=... argument to CategoricalCrossentropy, as that’s an optional argument listed in the API, but that just produces an “unexpected keyword argument” error.

That’s not correct, before you call tf.reduce_sum(), you should have a tensor with two values for the test in the notebook: tf.Tensor([0.25361034 0.5566767 ], shape=(2,), dtype=float32)

I’m getting a single number back from CategoricalCrossentropy. All the examples in the TF documentation for that function also show a single real value as the return value, whether or not the reduction argument is specified.

@TMosh I hear you saying that I should be getting a two-element vector back from calling my CategoricalCrossentropy function, which I should then be passing to reduce_sum. Do I misunderstand you?

Thanks for weighing in. Yes, I have transposed both labels and logits. Yes, I am specifying from_logits=True.

It’s really a very simple function!

I call tf.losses.CategoricalCrossentropy with from_logits=True to get a function for calculating my CCE.

I call that function I just obtained, passing it transposed versions of the two input arguments (logits and labels).

I call reduce_sum on the result of that.

I could nest all that into a single line, of course, but splitting it into separate lines lets me inspect the results of each step.

It seems to me that I must be doing something wrong in my use of CategoricalCrossentropy, since I’m getting a single value from that BEFORE reducing. But what??

I am now very confused. According to the docs here, CategoricalCrossentropy does not take actual predictions or ground truth values as arguments. The usage examples show it being used to return a function, which is then called with one’s logits and labels.

…oh, crap. Apparently, that’s not the right function. I just figured out that y’all want us to use tf.keras.metrics.categorical_crossentropy, and not tf.keras.losses.CategoricalCrossentropy. The latter is what came up when I searched in the docs, and I never realized it’s not exactly the right name.

The problem is that you are using a different function than the one they told you to use. I did it both ways and instrumented the logic a bit to show what is happening:

vector_loss [0.25361034 0.5566767 ]
<class 'tensorflow.python.keras.losses.CategoricalCrossentropy'>
cce_loss 0.40514349937438965
mean(vector_loss) 0.40514349937438965
total_loss = 0.8102869987487793
tf.Tensor(0.810287, shape=(), dtype=float32)
All test passed

The quantity I call vector_loss is the output of the correct function before applying the “reduce sum”. As Tom points out, it has 2 elements: one for each sample. If I average those, you can see I get the same answer you got with your method.

The function you chose evidently is defined to compute the mean loss, but that is not what we want here.

For linear function grader cell,
please do the correction in code according to the below instruction which is given in your notebook before the grader cell

Note that the difference between tf.constant and tf.Variable is that you can modify the state of a tf.Variable but cannot change the state of a tf.constant .

You do not need to use tf. constant to your code.

In your one hot matrix, you do not require axis =0 as you by calling this the new axis would be created at 0 dimension.
In the same one hot matrix grader cell, you are going to create a single-column matrix, then why have you tuple that with indentation error?

In this grader cell when the initialiser is already recalled by
initializer = tf.keras.initializers.GlorotNormal(seed=1)
then why are you again recalling your parameters with tf.keras.initializers.GlorotNormal(seed=1) in every parameter?

GRADED FUNCTION: initialize_parameters

GRADED FUNCTION: forward_propagation

for activation you need to use tf.nn.relu instead of tf.keras.activation.relu

5.# GRADED FUNCTION: compute_total_loss

#(1 line of code)
# remember to set from_logits=True
# total_loss = …

your grader cell clearly mentions your total loss is single code line as explain by @TMosh with the instructions given in his previous comments which is from the same notebook you are doing.

You need to recall your total loss based on the below instructions
Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.

It’s important to note that the “y_pred” and “y_true” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).

tf.reduce_sum does the summation over the examples.

You skipped applying “softmax” in Exercise 5 which will now be taken care by the tf.keras.losses.categorical_crossentropy by setting its parameter from_logits=True (You can read the response by one of our mentors here in the Community for the mathematical reasoning behind it. If you are not part of the Community already, you can do so by going here.)

You probably got confused with the batch of samples to recall it separately.