My result is `tf.Tensor(0.81028694, shape=(), dtype=float32)`

The expected one is `tf.Tensor(0.810287, shape=(), dtype=float32)`

They are really close but different enough to not pass the test.

- I used
`tf.transpose`

to transpose both tensors.
- I used
`tf.nn.softmax`

to compute `y_pred`

.
- I used
`tf.keras.losses.categorical_crossentropy`

and `tf.reduce_sum`

to compute the cost function.

Instead of manually including the softmax, try using the `from_logits`

argument to tell the cost function to do that internally. That is more efficient and is the point of why they did not already include softmax when they defined forward propagation.

3 Likes

This answer can be improved by explaining the numeric difference between the `from_logits`

keyword and `tf.nn.softmax`

.

There should be no difference in principle, if we were doing pure math here. But the problem is we are living in the pathetically limited world of 64 bit floating point. There are literally only 2^{64} distinct values you can represent as opposed to the abstract beauty of \mathbb{R} in which you have \aleph_1 possible values either on the whole number line or between 0.0001 and 0.0002. As a result, different ways of implementing a computation or even doing it in a slightly different order can give different rounding behavior. They must have done an exact equality comparison in the grader test case or used too small an error threshold if they used numpy `allclose`

.

The high level way to state that is the reason that the `from_logits = True`

method is preferred is because it is more numerically stable, meaning that it has better behavior with respect to the propagation of rounding errors.

2 Likes

I am getting

Test 1: tf.Tensor(0.17102128, shape=(), dtype=float32)

instead of the expected output

Test 1: tf.Tensor(0.810287, shape=(), dtype=float32)

Here is a checklist of the most common problems on that function. I think the incorrect value you show is the result of the first mistake on that list.

1 Like