DLS 2 Week 3_Exercise_6_compute_cost()_ERROR

Hi Paulin,

Thank you very much for this comment! I am getting the, apparently, correct output tf.Tensor(0.4051435, shape=(), dtype=float32). However, my notebook still tells me, the expected output is tf.Tensor(0.810287, shape=(), dtype=float32).
How can I get the new version of the notebook?

Thank you very much in advance!

Janine

Hi, Janine.

If the “expected value” is 0.81xxx, then that is the new version of the notebook. They recently (early September 2022) changed the definition of the compute_cost function in this assignment to be the sum of the costs across all the samples, rather than the mean of the costs across the samples. The reason that they did that is to make it consistent with how the compute_cost function worked in the minibatch logic in the Optimization Assignment in C2 W2. When we are doing Minibatch Gradient Descent, it works better to just keep the running sum of the costs across all the samples in all the minibatches and then to compute the average cost only when we finish each Epoch (a full pass through all the minibatches). The reason is that all the minibatches will not be the same size if the batch size does not evenly divide the total training set size, right? We saw an example of that in the minibatch assignment that I mentioned above. So the average of the averages would not work in that case. To get the detailed picture, please have a look at how that logic worked in the previous exercise.

Please check that you used reduce_sum and not reduce_mean in your compute_cost logic.

Also just as a general note: anytime you want to get a clean copy of the latest version of any assignment, there is a topic about how to do that on the DLS FAQ Thread (the very first topic there).

Regards,
Paul

Hi Paul,

Thank you very much for your fast reply and your detailed explanation!
You are right, I changed reduce_mean to reduce_sum and now everything is fine.

Best regards,

Janine

Hi Paulin,

I think I am having the new version as the expected value is the same as Janine’s. However, I am getting an output of 0.88x. Both logits and labels have the shape of (6,2) and I am sure that the input order of the arguments in tf.keras.losses.categorical_crossentropy() is correct (the other order gives NAN).

My code is as follows:

{moderator edit - solution code removed}

What could I possibly miss?

Foo

1 Like

You are missing the fact that the inputs are logits and not activation outputs. Search the forums for from_logits for more info. I think that is also not the correct shape. The loss function expects the Samples dimension to be first, right? So transposes are required in order to achieve that.

Thank you so much Paul! It is not straightforward in the docs that which dimension comes first in the input argument. It never occurs to me to transpose the data.

Best,
Foo