I have read the first few comment lines for this exercise below:


    # enc_output.shape == (batch_size, input_seq_len, fully_connected_dim)        
    # BLOCK 1
    # calculate self-attention and return attention scores as attn_weights_block1.
    # Dropout will be applied during training (~1 line).

I have tried, but cannot proceed if I call self.mha1 and pass to it enc_output.shape and look_ahead_mask.

I have also read the EncoderLayer as reference - it has the code of self_mha_output = self.mha(x, x, x, mask). So, I also have passed x, x, x, look_ahead_mask to self.mha1 in Decoder for exercise 2. However, it also cannot work.
Is there any advice?

1 Like

what error are you encountering @pongyuenlam please share screenshot of the error

The error screen is attached.

1 Like

why are you using enc_output.shape??? what does the argument states ?? check again !! should it be only encoder output???

if you notice just below the START CODE HERE, enc_output.shape is placed with a # hashtag, so it is really not using that code for the code you wrote!!!

Update this response is incorrect as the learner used enc_output.shape for block1 but he had to use enc_output for block2 with other correct argument calls.

I see. Thanks. If I only pass encoder output to mha1, I have got new errors.

1 Like

Hi @pongyuenlam

my response only based on the error log you gave, I didn’t see you had written the codes for block1 which is incorrect.

even if I go with your previous code, it is still incorrect

  1. Block 1 is a multi-head attention layer with a residual connection, and look-ahead mask. Like in the EncoderLayer, Dropout is defined within the multi-head attention layer.

As the block 1 has 3 multi layer, you require to pass the self.mha1 to 3 linear layer recalled for class DecodeLayer i.e x, x, x with correct mask(which you already used here) but you have missed return_attention_scores recall for this block1. Remember return_attention_scores needs to be recalled with the right argument, which you have used in other assignment you were doing.

Read the instructions again what does it states
calculate self-attention and return attention scores as attn_weights_block1.
Dropout will be applied during training


That’s why you are unable to get some of the basic understand of deep learning.

The updated course of NLP included tensorflow, so it made things more difficult to understand and implement.

The 5th courses Sequence Models of Deep learning specialisation covers the part where you are stuck right now.

It is my sincere suggestion to complete Deep learning Specialisation first after MLS or before MLS, then only switch to tensorflow or NLP specialisation. Otherwise whatever I am explaining you might be just going over your head Lam.


@Deepti_Prasad Thanks for referring me to the 5th course in Sequence Models of Deep Learning specailization. I have been studying it in the whole afternoon to enhance my understanding. I can proceed for this exercise and get to the following error now. I did put in padding_mask mha2 and ffn. Did I miss anything?

1 Like

can you DM me the updated codes after you made the corrections

Hi @pongyuenlam

First of thank you for putting efforts from your side to understanding what you are trying to learn. It feels great as a mentor when we come across learners who also put sincere effort. Once you complete NLP, do complete the Deep Learning Specialisation. Believe me you will not regret.

Now comes to your code, I will go step wise.

  1. after calculate self attention for block 1(codes were correct for that), you had to apply layer normalization (layernorm1) to the sum of the attention output and the input
    You used correct self.layer recall but there are two mistake, you didn’t required to use tf.add and second mistake use the simple method of addition to sum attention output and x but you have used ((mult_attn_out1, x)

  2. Read the instruction for applying layer normalisation here you had to apply layer normalization to the sum of the attention output and the output of the first block but you have basically summed up attention output1 and attention output2 which is incorrect here. Also the same mistake mentioned in point 1, not to use tf.add and use the addition operator to add the mult_attn_out2 and Q1 which is the output of the first block.

  3. BLOCK3. Next instruction mentioned was pass the output of the second block through a ffn, but added a padding mask to the block3 which was not required here.

  4. next the code instruction mentions
    apply a dropout layer to the ffn output
    use training=training
    But you missed adding training=training while applying dropout layer to the fun output

  5. Again to apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block, please remove the tf.add and use addition operator to the mentioned output in instruction which you chose correctly.


1 Like

@Deepti_Prasad Great to have advice from you! I have revised my codes, and pass this exercise. I have completed some exercises in the Deep Learning Specialization so that I can learn and work on the exercise for NLP. Great content to learn!