Course 2, Week 3, compute_total_loss, failed with really close result

My result is tf.Tensor(0.81028694, shape=(), dtype=float32)
The expected one is tf.Tensor(0.810287, shape=(), dtype=float32)
They are really close but different enough to not pass the test.

  • I used tf.transpose to transpose both tensors.
  • I used tf.nn.softmax to compute y_pred.
  • I used tf.keras.losses.categorical_crossentropy and tf.reduce_sum to compute the cost function.

Instead of manually including the softmax, try using the from_logits argument to tell the cost function to do that internally. That is more efficient and is the point of why they did not already include softmax when they defined forward propagation.


This answer can be improved by explaining the numeric difference between the from_logits keyword and tf.nn.softmax.

There should be no difference in principle, if we were doing pure math here. But the problem is we are living in the pathetically limited world of 64 bit floating point. There are literally only 2^{64} distinct values you can represent as opposed to the abstract beauty of \mathbb{R} in which you have \aleph_1 possible values either on the whole number line or between 0.0001 and 0.0002. As a result, different ways of implementing a computation or even doing it in a slightly different order can give different rounding behavior. They must have done an exact equality comparison in the grader test case or used too small an error threshold if they used numpy allclose.

The high level way to state that is the reason that the from_logits = True method is preferred is because it is more numerically stable, meaning that it has better behavior with respect to the propagation of rounding errors.


I am getting
Test 1: tf.Tensor(0.17102128, shape=(), dtype=float32)
instead of the expected output
Test 1: tf.Tensor(0.810287, shape=(), dtype=float32)

Here is a checklist of the most common problems on that function. I think the incorrect value you show is the result of the first mistake on that list.