@paulinpaloalto Can you please help me here
There is at least one problem with your code, but I donât see how it could cause that error. Remember that the forward propagation routine specifically does not include the activation function on the output layer. That means you need to add the argument from_logits = True to the loss function to tell it to do the softmax calculation internally.
The mismatching shapes must mean that either the code you are showing is not what you actually ran (e.g. you had not done âShift-Enterâ to run the cell since the last time you changed the code) or you have hacked on the test cell to change one of the parameters so that they donât match.
I think, I havenât changed any of the parameters and implemented even using from_logits = True but that doesnât workout for me.
{moderator edit - solution code removed}
Still I am seeing the same error.
I am unable to proceed further with this error
@paulinpaloalto All the above test cases are passed successfully.
So you must have modified the inputs to the test case, right?
No, I havenât done that ever since I am even unable to edit that particular cell consisting of the definition compute_cost_test.
The only changes I have made is writing the 3 lines of code.
Ah, thatâs a good point. Well, notice that the âlabelsâ input is a pre-defined variable new_y_train that was created by a much earlier cell in the notebook. Maybe there is something wrong with the logic in your notebook that creates new_y_train.
I added a cell right before the compute_cost test cell:
print(new_y_train)
mbs = new_y_train.batch(2)
for mb in mbs:
print(mb)
break
When I run that, hereâs what I see:
<MapDataset shapes: (6,), types: tf.float32>
tf.Tensor(
[[0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0.]], shape=(2, 6), dtype=float32)
Please try that and see if you end up with an output that is 2 x 4 instead.
Ok, thatâs wrong. So why is it wrong? You have to examine the earlier logic that creates new_y_train. This is called âdebuggingâ, right? You reason from the evidence you see and work backwards. Things donât just happen for mysterious reasons: your job is to follow the evidence to understand why this happened.
Look at the logic of your âone hotâ routine, since new_y_train is the output of that function. You probably hard-coded the number of classes, instead of using the depth parameter.
Yes, I got it now Thanks for your whole support @paulinpaloalto
I have changed the test case from 4 to 6.
I still have errors here after I read through and tried to debug my code.
I did transpose two inputs and and added from_logits=True as following:
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true=tf.transpose(labels), y_pred=tf.transpose(logits)ďźfrom_logits=True))
but the result I got is :
File ââ, line 20
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true=labels, y_pred=logitsďźfrom_logits=True))
^
SyntaxError: invalid character in identifier
My guess is that there is an unprintable character somewhere adjacent to one of the variable names either on that line or perhaps on the line before. Try backspacing over all the names and retyping them and see if that helps. If that doesnât work, you can also try getting a clean copy of the notebook and then carefully âcopy/pastingâ over your completed code.
Hi @paulinpaloalto and @joduss
Would you please explain why do we need to transpose
BOTH labels
and logits
using tf.keras.losses.categorical_crossentropy
? What is the logic (math) behind this?
Thanks in advance!
Itâs not a matter of logic or math: it is the definition of the API. I also donât understand why you emphasize âbothâ: why wouldnât you want the two tensors to be oriented the same way? Either transposed or not âŚ
If I may continue @paulinpaloalto reasoning:
I have seen similar posts like these, and I think the confusion arises because Prof Andrew Ng uses [features, batch_size]
, whereas TensorFlow prefers [batch_size, features]
. However, the order is just a matter of taste. Similar to whether num_channels
should be the first or the last dimension for convolutions.
Thank you so much @paulinpaloalto.
The reason I emphasize both
is I was thinking about the vectorized implementation logic, which the shape of the two matrices has to be aligned to get a proper result. If this is the case, I was wondering why we didnât just transpose
one tensor (instead of both) to satisfy the logic?
In the API documentation link, categorical_crossentropy, it doesnât mention that we need to transpose y_true
and y_pred
. What do I miss?
@amin.pahlavani: What youâre missing is the point that @jonaslalin made in his earlier reply: the TF APIs are defined to expect a particular ordering of the dimensions of the tensors and Prof Ng has chosen to use a different ordering, so we frequently need to adjust things.
Hi Paul,
I am currently having trouble with the compute cost function.
This is what I currently have, I believe this is correct with what was said above (need to transpose and also include from_logit =True), but I am still getting an error saying that it does not match with test case.
[removed code]
Oh hey sorry, just realized that I had binary_crossentropy in place instead of categorical_crossentropy. I can get rid of the code I put above if youâd like.
edit: removed
Hi, Kenny.
Itâs great that you figured out the solution under your own power! Thanks for confirming and for removing the source code.
Cheers!
Paul