I’m having trouble with the EncoderLayer exercise. I am getting an assertion error claiming that “wrong values when training=True”. I’ve looked at some old threads for this issue, and it is getting me nowhere. Furthermore, I’ve tried removing and adding ‘training=training’ into different places to test it.
I wonder if something else is going wrong, but I am being sent back this error instead.
First, you don’t need to specify training = training everywhere, other than where mentioned.
Given that:
You will pass the Q, V, K matrices and a boolean mask to a multi-head attention layer. Remember that to compute self -attention Q, V and K should be the same.
So, we have three terms all denoted by x in our code as all are the same here, and then mask.
Also, why do we calculate skip_x_attention ? We need to use that skip_x_attention in self.ffn.