Course 5 W4 Assignment - Wrong values for case 1 in EncoderTest

Hi

I am working on the Encoder part of the Transformer Arch assignment. I am getting the below error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-24-edf0595e0478> in <module>
      1 # UNIT TEST
----> 2 Encoder_test(Encoder)
      3 
      4 # tf.random.set_seed(10)
      5 

~/work/W4A1/public_tests.py in Encoder_test(target)
    124                         [[-0.4612937 ,  1.0697356 , -1.4127715 ,  0.8043293 ],
    125                          [ 0.27027237,  0.28793618, -1.6370889 ,  1.0788803 ],
--> 126                          [ 1.2370994 , -1.0687275 , -0.8945037 ,  0.7261319 ]]]), "Wrong values case 1"
    127 
    128     encoderq_output = encoderq(x, True, np.array([[[[1., 1., 1.]]], [[[1., 1., 0.]]]]))

AssertionError: Wrong values case 1

All my previous unit tests (like for EncoderLayer) are passing fine. I have done quite a bit of troubleshooting but am unable to spot the place where I may be making a mistake. So, any help here could help. I can share my notebook offline if required. Couple of things I noticed during my troubleshooting is that, 1) Same assertion error for all the other cases too and, 2) these values are not matching only for the second batch in all the 3 cases.

Kindly help. Thank you

Karthik

Are you sure you are using training=training for a dropout layer?

Yes, I have used training=training in the Dropout layer. Pls let me know if you would like me to DM the notebook to you.

Also, I noticed that the results are not matching even for the case when training is set to False.

Thanks again for the help.

Update:

I made a minor modification to the EncoderLayer class as I had doubt that this layer may be broken though all the unit tests passed.

I had to set the return_attention_scores in self.mha to True to make it work. Prior to this it was using the default for this flag. I was under the impression that this flag will just decide whether to send the scores along with the attention output or not, which anyway we will not be using in the subsequent steps (only the the attention output is used per my understanding)

I am not sure how this change of sending the scores as well can result in different set of values for the encoder in the downstream layers. Any info would help me learn this concept better.

Encoder tests are passing now.

Thank you.

Karthik