Hey @muhammadahmad, the shapes of labels and logits you are passing is not in the shape as they are expected. You need to take transpose of those before passing. Reading the document always helps.
Thanks it worked. The expected output mentioned in the assignment is not correct and after evaluation it gave me 80/100 but I have passed all the tests so kindly check those bugs.
Also, I have removed your mark of my post as “solution”, because I gave you the answer directly. We encourage learners to read the documentation. We want for them to figure this on their own.
I just finished going through the new version of this assignment also. There are a number of incorrect “expected values”. I can file the GitIssue on this if you want.
There is a new version of the notebook which has the correct “expected value” for the compute cost test case. It should show as:
Expected output
tf.Tensor(0.4051435, shape=(), dtype=float32)
If it doesn’t, then you don’t have the latest version. If the value is correct, but your code produces a different value then you have a bug that you need to find. Common mistakes are:
Using binary_crossentropy loss instead of categorical_crossentropy.
Missing the instructions about the shapes that are needed by categorical cross entropy.
Forgetting to specify the correct value of the from_logits parameter to the loss function.
Thank you for the comment—it is very useful and an apt summary of the mini bloom of discourse issues surrounding this issue in recent days.
What is the third bullet point in that comment? I do have the 0.405 expected value, have already reshaped the values (tried both tf.transpose and tf.reshape), and checked that I am using categorical cross entropy. Yet, I am still not getting the right value.
I don’t see a “from_logits” parameter anywhere in the notebook. What are you referring to?
I guess they must have removed some text from the notebook. The point is that all the loss functions support the idea that you leave out the activation function on the output layer and then let the loss function do both the activation (sigmoid or softmax, depending on whether it’s a binary or multiclass classification) and the “log loss” calculation as a bundled operation. It turns out that doing it that way is both more efficient (one less TF call) and more numerically stable.
Notice that we are told not to include the output activation on the last layer in forward_propagation. So we need to specify from_logits = True to tell the loss function we are giving it inputs that are not the activation outputs. That is an optional parameter to the loss functions and the default value is False.
I see… just to see if I understand this correctly: in forward_propagation we only calculate Z3… because the categorical_crossentropy function includes the activation function AND loss function inside it?
**It would probably also be useful to add that from_logits note in the notebook!
Yes, that’s the correct interpretation. You need to specify from_logits = True on the loss function to tell it to do the activation internally. That is optional, but it is the way Prof Ng always has us do things. As I mentioned above, it’s less code and it’s more numerically stable, so why wouldn’t you do it that way?
BTW the from_logits parameter is described on the documentation pages for categorical_crossentropy and binary_crossentropy. You have read those, right?
@andros: It’s good to hear that you found the solution under your own power. If you passed the test, I assume you also figured out that you have the order of the arguments backwards for labels and logits.
cost = tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits),from_logits=True)
tf.reduce_mean(cost)
I had to transpose logits and labels before using tf.keras.losses.categorical_crossentropy, but it not work. Can anyone help me?
The first line looks correct to me. But notice that you don’t assign the output to a tensor for the reduce_sum. I tried that and I end up with a 2 element 1D tensor, even if I also supply the argument axis = None (which the documentation says is the default). The documentation makes it sound like this should work, but my observation is that it does not. Perhaps this is because we are running in “Eager” mode …
The test cell prints the value of cost. What do you see with your code?