Compling error C4W2

Hi together,
I get the following compiling error with the C4W2 assignment after submitting it for grading “There was a problem compiling the code from your notebook. Details: name ‘maximum_position_encoding’ is not defined”. I get that error for every exercise. In my notebook, I get “all tests passed” for all exercises until “exercise 4 Transformers” where I get the following error. Thankful for help on this!

week-2 NLP with Attention Models

Did you perhaps add some “import” statements to your notebook, which you were not supposed to?

If you have that kind of error in your notebook, then you get 0 points for everything because the grader can’t even get as far as running any of your code since it doesn’t compile. Are you sure that you scanned your local notebook and don’t have that error anywhere? Please do this sequence:

Kernel -> Restart and Clear Output
Cell -> Run All

Now scan through the entire notebook and make sure that error does not show up anyplace. If it does, then you know what you need to fix. If it doesn’t, then your error is more subtle: it must be something that works in the notebook, but fails the grader. For a “variable not defined” error, the thing to look for would be cases in which you accidentally reference global variables from the body of a function.

One other general note is that it doesn’t make sense to submit to the grader until you have completed all sections of the notebook and pass all the local unit tests in the notebook.

The other error is a bit more complicated. I added some print statements in my Transformer logic and here’s what I see for the arguments right before the call to self.encoder:

input_sentence.shape (1, 7)
enc_padding_mask.shape (1, 1, 7)
Using num_layers=3, target_vocab_size=350 and num_heads=17:

sentence_a has shape:(1, 7)
sentence_b has shape:(1, 7)

Output of transformer (summary) has shape:(1, 7, 350)

Attention weights:
decoder_layer1_block1_self_att has shape:(1, 17, 7, 7)
decoder_layer1_block2_decenc_att has shape:(1, 17, 7, 7)
decoder_layer2_block1_self_att has shape:(1, 17, 7, 7)
decoder_layer2_block2_decenc_att has shape:(1, 17, 7, 7)
decoder_layer3_block1_self_att has shape:(1, 17, 7, 7)
decoder_layer3_block2_decenc_att has shape:(1, 17, 7, 7)

So the shapes I get match the ones in the error message you show. My theory would be that indicates that the problem is not the shapes per se, but what you do with the operands in your actual Encoder logic. In my code the mask is only used as one of the arguments in the call to the MultiHeadAttention layer. Note that your error was thrown by an AddV2 operation, but I don’t have any addition operations in my Encoder logic involving the mask argument. But maybe the exception happens in a lower level TF function. You don’t really show us the complete exception trace. Seeing that might shed some more light on where the exception actually happened.

did you use maximum position encoding to add position encoding to the embedding in the encoder grade cell?

You haven’t posted the complete error screenshot but from what I understand you probably have not used the training recall correctly and in the class encoder your passing the output through the stack of encoding layers you mixed codes.

firstly post screenshot of the complete error, in case it is lengthy, take separate two screenshots.

for the next part of the error I have to cut a bit to not post solutions to graded parts

Many thanks! Unfortunatly rerunning the cells as you described didn’t show the error in the output so the more subtil case is happening for me

so is your issue resolved?

no its’s not resolved yet, I was just commenting that I tried the suggested approach which unfortunately didn’t give me more hints to the error source

ok based on error screenshot, the error points on call statement for full encoder but the codes in the screenshot seems correct. So I would like to see the encoder codes from the Encoder layer grade cell.

please DM your encoder layer def call codes. click on my name and then message.

in the decoder layer, for self attention block1 and block2 there was no mention for you to use training=training as the argument mentions Training is applied for the self attention already.

So kindly remove training=training for the self attention layer. You were suppose to only apply in the dropout layer which you did.

So if I point you from the error screenshot point of you, it explains because training was false for the encoder layer but the decoder layer you passed training as true, the error was encountered.

In the grade function scaled dot product, for code line

softmax is normalized on the last axis (seq_len_k) so that the scores add up to 1.
you are suppose to use tf.keras.activations.softmax and not tf.nn.softmax

Another difference I noticed from instructions to your code is in layer normalisation where it requires you to mention first multi attention output and then input in bloc1 and for block 2 mention first multi attention output2 and then the output of the first block. you have mentioned the codes in reverse position.

Thank you so much. I have corrected theses two issues. Nonetheless after restarting and rerunning the kernel I still receive the same errors. Still, I pass the tests for the transformer exercise in the notebook


Grading remains 0 points

i suspect there might have been some changes done in places where you weren’t suppose to.

please first delete this file, get fresh copy and only write codes where it is required.

let me know if you are still encountering this issue.

you should have deleted your kernel output, then restarted the kernel and then run each cell one by one.

Remember restarting and rerunning the kernel still keeps the previous value result.

whenever you restart, first delete the kernel output.

It’s resolved now! Thank you so much!

1 Like