Week 4 Assignment Encoder Wrong Values

Hi,
I have been having issues with the Encoder Layer. I can’t get the correct values in this layer. I have consulted the existing threads on here but they don’t seem to help.

Some observations:
-mask is not used - not sure where to put it`

-I am confused by the instruction on the two sum layers. They seem to be in contradiction to what has been said in the video. For example - “# apply layer normalization on sum of the input and the attention output to get the output of the multi-head attention layer (~1 line)” - but in the video we see that layer norm is only applied to the mha layer, then added by a skip connection.

This means that there are two possible ways to do this, either self.layernorm1(mha_output + x) or self.layernorm1(mha_output)+ x. Neither works.

This also means that I don’t know what to put in the second addition layer. Likewise, tried both self.layernorm2(ffn_output +mha_output_after_activation)and self.layernorm2(ffn_output)+mha_output_after_activation, neither works.

  • The training boolean - i used it in if statements:
if training:
       run a layer

.

Not sure if this is the right way to go about it.

While I have enjoyed the course so far, I must echo the others by saying that the hints for this assignment are the least helpful so far. I think they can be clearer- such as what should be added in the addition layers.

Thank you!

OK, fixed it. It turns out you do need the mask in the mha layer. I also had to add the sum of the skip connection and the new layer and then run the layer norm on it.

Hey, could you pls help, since I do not know if sth like this is a correct syntax: out1=self.layernorm1(self_attn_output+x). Everything seams ok, but i am still getting the error wrong values on both the EncoderLayer and the Encoder?
Regrards,

For the grader, you need to put x first and then the attn_output , do give it a try !