C4W2_Assignment - Excercise 4 Transformer

Hello,
I am getting an error in the Excercise 4, when passing the tests but i can’t see where is the mistake, i am also getting values really close to the real ones:


Then if i continue with Excercise 5, I pass all the tests again even when in cell 30 i am not getting the expected output.
Later in cell 33 i am getting this error:
‘’’
There was a problem compiling the code from your notebook. Details:
Exception encountered when calling layer ‘softmax_2’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [1,2,150,150] vs. [1,1,2,2] [Op:AddV2] name:

Call arguments received by layer ‘softmax_2’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 150, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 2, 2), dtype=float32)‘’’

and if i grade the notebook i get a 0 with the following error again in all the grades:
‘’’
Although, acroos the whole notebook i’ve been facing this kind of warnings even after having deleted the files, rebooted and got the latest version:



Thanks you for your time!

I would start by checking the outputs produced by the create_padding_mask function and create_look_ahead_mask function. As you can see

the mask and inputs do not have the same shapes!

but this is a lock cell i am not suppose to overwrite isn’t it?

@JOSE_DANIEL_HERNANDE

@gent.spah response is to the query asked by @dani7991. So kindly create your own topic with screenshot of your error or output different than expected.

what @gent.spah means that error log is telling, the issue lies somewhere in the previous grade cell.

from that error detail what I can understand the inputs are not of same shape. so the most probable reason would be while mentions inputs, the post creator learner must have added an extra dimension by mentioning input as (None,1) instead which should have been (1) with the correct tf. string datatype.

Regards
DP

1 Like

sorry, it was me but i didn’t see i was logged on the other account

and answering @Deepti_Prasad , I have not added any extra dim, i can’t post my code but i haven’t modified any input of the transformer function and neither the decoder

oh okay :crazy_face: I didn’t know people have two accounts in discourse community.

can you share screenshot of the codes by personal DM from the grade cell it showed first error.

that cuda screenshot is only warning related to tensorflow. but the. copy paste error you mentioned does mentions your inputs are not of same shape, that could also be related encoder or decoder codes

Regards
DP

1 Like

hi @dani7991

  1. in GRADED FUNCTION: scaled_dot_product_attention
    for code line
    softmax is normalized on the last axis (seq_len_k) so that the scores add up to 1.
    You didn’t required axis argument as it is already normalised to 1.

  2. In exercise 2, for BLOCK1 AND BLOCK2, while calculating self-attention, you added training=training which was not required for these steps as the instructionsentions dropout is already mentioned in multihead attention layer.

3.In Exercise 4 Transformer, for code line

pass decoder output through a linear layer and softmax, you have again included training=training, which was not required because as per the below instruction

Finally, after the Nth Decoder layer, one dense layer and a softmax are applied to generate prediction for the next output in your sequence

you already used in the previous step while using call function for self.encoder

Let me know if issue gets resolved.

1 Like

can you post here the error you got now. the error log you shared is related to the next word grade cell code as with your current error it clearly states your code for prediction_id is incorrectly sequenced.

Kindly post your query clearly to avoid confusion. your main error was invalid argument which you only shared a part of it in the first comment you asked for your issue.

codes are always interconnected. see the last image in the first comment you posted, in the screenshot you only included the header part without sharing the whole error log.

sure
this is the whole trace



notice the prediction id for next word, you have sequenced it incorrectly. mention first input, then output, and lastly model.

but this is the call of the function which cell is blocked


and this is the next_word function, so the params are model, input and output

I am still not sure but i think the problem is above, the tests from the Transformer (Exercise 4) are still failing

Maybe we can dive into that part in private to show you the code?

yes @dani7991

I want to see your codes from first grade cell till the error was thrown after the correction you did.

chances are issue could be with masking.

Also I would advise to get a fresh copy and re-do the assignment.

hi @dani7991

Exercise 4, Transformer under def call statement, for code line

call self.decoder with the appropriate arguments to get the decoder output

You have used incorrect argument for dec_output, you are suppose to use output_sequence and not input sequence.

1 Like

omg!! Finally, it was that, so many thanks!!!