Cannot compute_cost course 2 week 3

@paulinpaloalto Can you please help me here

1 Like

There is at least one problem with your code, but I don’t see how it could cause that error. Remember that the forward propagation routine specifically does not include the activation function on the output layer. That means you need to add the argument from_logits = True to the loss function to tell it to do the softmax calculation internally.

The mismatching shapes must mean that either the code you are showing is not what you actually ran (e.g. you had not done “Shift-Enter” to run the cell since the last time you changed the code) or you have hacked on the test cell to change one of the parameters so that they don’t match.

3 Likes

I think, I haven’t changed any of the parameters and implemented even using from_logits = True but that doesn’t workout for me.

{moderator edit - solution code removed}

Still I am seeing the same error.

I am unable to proceed further with this error

1 Like

@paulinpaloalto All the above test cases are passed successfully.

1 Like

So you must have modified the inputs to the test case, right?

1 Like

No, I haven’t done that ever since I am even unable to edit that particular cell consisting of the definition compute_cost_test.
The only changes I have made is writing the 3 lines of code.

2 Likes

Ah, that’s a good point. Well, notice that the “labels” input is a pre-defined variable new_y_train that was created by a much earlier cell in the notebook. Maybe there is something wrong with the logic in your notebook that creates new_y_train.

I added a cell right before the compute_cost test cell:

print(new_y_train)
mbs = new_y_train.batch(2)
for mb in mbs:
    print(mb)
    break

When I run that, here’s what I see:

<MapDataset shapes: (6,), types: tf.float32>
tf.Tensor(
[[0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0.]], shape=(2, 6), dtype=float32)

Please try that and see if you end up with an output that is 2 x 4 instead.

1 Like

Yes I am able to get an output of 2x4

1 Like

Ok, that’s wrong. So why is it wrong? You have to examine the earlier logic that creates new_y_train. This is called “debugging”, right? You reason from the evidence you see and work backwards. Things don’t just happen for mysterious reasons: your job is to follow the evidence to understand why this happened.

Look at the logic of your “one hot” routine, since new_y_train is the output of that function. You probably hard-coded the number of classes, instead of using the depth parameter.

1 Like

Yes, I got it now Thanks for your whole support @paulinpaloalto
I have changed the test case from 4 to 6.

1 Like

I still have errors here after I read through and tried to debug my code.
I did transpose two inputs and and added from_logits=True as following:

cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true=tf.transpose(labels), y_pred=tf.transpose(logits),from_logits=True))

but the result I got is :
File “”, line 20
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true=labels, y_pred=logits,from_logits=True))
^
SyntaxError: invalid character in identifier

@paulinpaloalto

1 Like

My guess is that there is an unprintable character somewhere adjacent to one of the variable names either on that line or perhaps on the line before. Try backspacing over all the names and retyping them and see if that helps. If that doesn’t work, you can also try getting a clean copy of the notebook and then carefully “copy/pasting” over your completed code.

1 Like

Hi @paulinpaloalto and @joduss

Would you please explain why do we need to transpose BOTH labels and logits using tf.keras.losses.categorical_crossentropy? What is the logic (math) behind this?

Thanks in advance!

2 Likes

It’s not a matter of logic or math: it is the definition of the API. I also don’t understand why you emphasize “both”: why wouldn’t you want the two tensors to be oriented the same way? Either transposed or not …

4 Likes

If I may continue @paulinpaloalto reasoning:

I have seen similar posts like these, and I think the confusion arises because Prof Andrew Ng uses [features, batch_size], whereas TensorFlow prefers [batch_size, features]. However, the order is just a matter of taste. Similar to whether num_channels should be the first or the last dimension for convolutions.

5 Likes

Thank you so much @paulinpaloalto.

The reason I emphasize both is I was thinking about the vectorized implementation logic, which the shape of the two matrices has to be aligned to get a proper result. If this is the case, I was wondering why we didn’t just transpose one tensor (instead of both) to satisfy the logic?

In the API documentation link, categorical_crossentropy, it doesn’t mention that we need to transpose y_true and y_pred. What do I miss?

1 Like

@amin.pahlavani: What you’re missing is the point that @jonaslalin made in his earlier reply: the TF APIs are defined to expect a particular ordering of the dimensions of the tensors and Prof Ng has chosen to use a different ordering, so we frequently need to adjust things.

5 Likes

Hi Paul,

I am currently having trouble with the compute cost function.

This is what I currently have, I believe this is correct with what was said above (need to transpose and also include from_logit =True), but I am still getting an error saying that it does not match with test case.

[removed code]

2 Likes

Oh hey sorry, just realized that I had binary_crossentropy in place instead of categorical_crossentropy. I can get rid of the code I put above if you’d like.

edit: removed

1 Like

Hi, Kenny.

It’s great that you figured out the solution under your own power! Thanks for confirming and for removing the source code.

Cheers!
Paul

2 Likes