Stuck on C2W3 Assignment: Cost Function

Hi. As many others seem to have been, I’m stumped on implementing compute_total_loss. Yes, I have made sure I’m addressing all four “common issues” listed in this thread. What I have noticed is:

  1. My calculated cost is 0.4051435, which is precisely half of the expected value.
  2. The output of calling CategoricalCrossentropy seems to be a single number, BEFORE using reduce_sum. In fact, it’s already 0.4051435, and reduce_sum leaves it unchanged.

Why would CategoricalCrossentropy return a number instead of a tensor that must be summed over?

I tried adding an axis=... argument to CategoricalCrossentropy, as that’s an optional argument listed in the API, but that just produces an “unexpected keyword argument” error.

Hello ibeatty,

Can you share the screenshot of the error log you got, so we know where the issue might be related with the unexpected keyword argument error.


If you put all the hints together, you get this:

  • Use tf.keras.losses.categorical_crossentropy() to compute the loss for each example.
  • Pass it the parameters for the labels and logits, and from_logits = True (since you should not have used softmax() in Exercise 5).
  • Bonus hint: You may need a couple of transpositions - as tf.transpose(<var_name>).
  • Then use tf.reduce_sum() to sum up all of the losses.
1 Like

That’s not correct, before you call tf.reduce_sum(), you should have a tensor with two values for the test in the notebook:
tf.Tensor([0.25361034 0.5566767 ], shape=(2,), dtype=float32)

Happily! Attached. (BTW, I get the same error whether I specify axis=0 or axis=1.)

Thanks // i

as per your error log, it means axis is not required in your code.

Also as explained by @TMosh I hope you have transposed label and logits, with that you need to call logits=True in your code line

Let us know once your issue is resolved, otherwise share further issue if any!!

You can send the code for the particular grader cell via personal DM. Click on my name and then message.


I’m getting a single number back from CategoricalCrossentropy. All the examples in the TF documentation for that function also show a single real value as the return value, whether or not the reduction argument is specified.

@TMosh I hear you saying that I should be getting a two-element vector back from calling my CategoricalCrossentropy function, which I should then be passing to reduce_sum. Do I misunderstand you?

Thanks // i

The axis parameter is unnecessary.

TMosh’s hint about transposing is the key to this assignment.

Thanks for weighing in. Yes, I have transposed both labels and logits. Yes, I am specifying from_logits=True.

It’s really a very simple function!

  1. I call tf.losses.CategoricalCrossentropy with from_logits=True to get a function for calculating my CCE.
  2. I call that function I just obtained, passing it transposed versions of the two input arguments (logits and labels).
  3. I call reduce_sum on the result of that.

I could nest all that into a single line, of course, but splitting it into separate lines lets me inspect the results of each step.

It seems to me that I must be doing something wrong in my use of CategoricalCrossentropy, since I’m getting a single value from that BEFORE reducing. But what??

Any suggestions are welcome…

Please send your notebook via personal DM. Probably the previous grader cells also need to have a look

I used separate lines for the transposes (but not necessary). Also putting the parameters in the correct order in the parenthesis matters as well.

Your Step 1 is why you’re getting a scalar result. You didn’t pass it the logits and labels - so it doesn’t have any data to operate on.

In your Step 1, you should pass logits, labels, and from_logits=True all at once.

Then your Step 2 isn’t necessary.

That is correct.

Also in your Step 1 maybe when you left out “keras” it caused an issue. I have not tested that possibility.

I am now very confused. According to the docs here, CategoricalCrossentropy does not take actual predictions or ground truth values as arguments. The usage examples show it being used to return a function, which is then called with one’s logits and labels.

…oh, crap. Apparently, that’s not the right function. I just figured out that y’all want us to use tf.keras.metrics.categorical_crossentropy, and not tf.keras.losses.CategoricalCrossentropy. The latter is what came up when I searched in the docs, and I never realized it’s not exactly the right name.


Thanks for being patient with me.

// i

The problem is that you are using a different function than the one they told you to use. I did it both ways and instrumented the logic a bit to show what is happening:

vector_loss [0.25361034 0.5566767 ]
<class 'tensorflow.python.keras.losses.CategoricalCrossentropy'>
cce_loss 0.40514349937438965
mean(vector_loss) 0.40514349937438965
total_loss = 0.8102869987487793
tf.Tensor(0.810287, shape=(), dtype=float32)
All test passed

The quantity I call vector_loss is the output of the correct function before applying the “reduce sum”. As Tom points out, it has 2 elements: one for each sample. If I average those, you can see I get the same answer you got with your method.

The function you chose evidently is defined to compute the mean loss, but that is not what we want here.

And if the question is why we want the sum instead of the mean loss, that is explained on this thread.

And to be fair here, they did give you a link in the instructions to the correct function.

Hello ibeaty,

  1. For linear function grader cell,
    please do the correction in code according to the below instruction which is given in your notebook before the grader cell

Note that the difference between tf.constant and tf.Variable is that you can modify the state of a tf.Variable but cannot change the state of a tf.constant .

You do not need to use tf. constant to your code.

  1. In your one hot matrix, you do not require axis =0 as you by calling this the new axis would be created at 0 dimension.
    In the same one hot matrix grader cell, you are going to create a single-column matrix, then why have you tuple that with indentation error?

  2. In this grader cell when the initialiser is already recalled by
    initializer = tf.keras.initializers.GlorotNormal(seed=1)
    then why are you again recalling your parameters with tf.keras.initializers.GlorotNormal(seed=1) in every parameter?

GRADED FUNCTION: initialize_parameters

  1. GRADED FUNCTION: forward_propagation

for activation you need to use tf.nn.relu instead of tf.keras.activation.relu

5.# GRADED FUNCTION: compute_total_loss

#(1 line of code)
# remember to set from_logits=True
# total_loss = …

your grader cell clearly mentions your total loss is single code line as explain by @TMosh with the instructions given in his previous comments which is from the same notebook you are doing.

You need to recall your total loss based on the below instructions
Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.

  • It’s important to note that the “y_pred” and “y_true” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).
  • tf.reduce_sum does the summation over the examples.
  • You skipped applying “softmax” in Exercise 5 which will now be taken care by the tf.keras.losses.categorical_crossentropy by setting its parameter from_logits=True (You can read the response by one of our mentors here in the Community for the mathematical reasoning behind it. If you are not part of the Community already, you can do so by going here.)

You probably got confused with the batch of samples to recall it separately.

Please do these corrections!!!


I am getting this error
ValueError: Shapes (1, 6, 2) and (6, 2) are incompatible

i have been using this total_loss ={moderator edit: code removed}, i also checked it for {code removed}