In the exercise it is mentioned that " It’s important to note that the “y_pred
” and “y_true
” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes). "
Do I understand it correctly that I have to reshape the logits (being the “y_pred”) – output of forward propagation (output of the last LINEAR unit), of shape (6, num_examples), as well as the “labels”, into matrices of shape (num_examples, 6) inside the tf.keras.losses.categorical_crossentropy()?
If I pass the labels (y_true) and logits (y_pred) as they are I get the following error (see the attached screenshot). Note that I did tf.reduce_sum the cost function.
Yes, if you study how they defined the forward propagation logic, you can see that the input data is arranged the way that Prof Ng has used up to this point: the first dimension is “features” and the second dimension is “samples”. That means by the output layer, we get “classes” as the first dimension and “samples” as the second. So you need to fix that, but using “reshape” is not the way to do that. You should use “transpose”. Those are two different things. You can end up with the correct shape using “reshape”, but the contents are not correct. The definition of transpose is to flip the matrix about the major axis, which is what you need in this case.
2 Likes
Thanks, @paulinpaloalto! I am still getting a wrong answer (see the screenshot below). I did tf.math.reduce_sum( … ). I left the default axis=None. I can’t figure out where the bug is!?
Did you do the transpose on both labels and logits? Are you sure you did it using tf.transpose
instead of tf.reshape
? Did you include from_logits = True
to take account of the fact that the output of the final layer is linear activation?
1 Like
Yes, I did tf.linalg.matrix_transpose() on both. I missed the “from_logits = True”. It worked now. Thanks a lot for your help!
I too am facing difficulty in passing this.
I used the tf.nn.softmax on logits → transpose → cross_ectropy with from_logits=True.
I get following output
tf.Tensor(10.425234, shape=(), dtype=float32).
Not sure where I am going wrong
The point of from_logits = True
is that it tells the cost function to do the softmax
internally. So you’ve done softmax twice, which is why it doesn’t work. Actually it’s even a little worse than that: if you did softmax
before the transpose, then the softmax
was also computed on the wrong axis. It matters, right?
I tried without softmax as well.
The problem was I was using pred and actual in reverse. Interchanged and it worked.
Thanks for helping me out.