Using embedding_dim=12 and num_heads=16:
q has shape:(1, 15, 12)
Output of encoder has shape:(1, 7, 8)
Output of decoder layer has shape:(1, 15, 12)
Att Weights Block 1 has shape:(1, 16, 15, 15)
Att Weights Block 2 has shape :(1, 16, 15, 7)
Expected Output
Output:
Using embedding_dim=12 and num_heads=16:
q has shape:(1, 15, 12)
Output of encoder has shape:(1, 7, 8)
Output of decoder layer has shape:(1, 15, 12)
Att Weights Block 1 has shape:(1, 16, 15, 15)
Att Weights Block 2 has shape:(1, 16, 15, 3)
Hi, I’m also facing the same issue where the shape of attn_weights_block2 is (1, 16, 15, 7) while it suppose to be (1, 16, 15, 3).
The shape of the other two values out3 and attn_weights_block1 are still correct
@lucas.coutinho , attn_weights_block2 should be of shape: (batch_size, num_heads, target_seq_len, input_seq_len) as per the Return comments in the call function. And input_seq_len is indeed 7.
Further, attn_weights_block1 and attn_weights_block2 should be of different shapes contrary to what is mentioned in commented block of call function.
Hi guys, thanks for your input on this. The issue is already in github and I have assigned it to your Curriculum Engineer, as soon as I get an update I will post here.
Hi all! The mkdown has a mistake, this will be fixed today. (You might not see the change unless you refresh your assignment to get the latest version but it will be fixed for future learners automatically)
Also this leads to
Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]
Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011] @a-zarta@lucas.coutinho@jyadav202
Are you certain there isn’t a defect in your code? Because if the test case and shape were incorrect, I think we would have a lot more reports about the issue than just yours.