This is the unit test function for decoder layer, I had initial issues with figuring out getting the attention weights but on resolving that my sample test cases are passing but the unit test function is failing, i have gone through the code but unable to find what is wrong with it. I need help with this.
can you share screenshot of the failed unittest output you got. please make sure not to post any grade cell codes here as it is against community guidelines.
Regards
DP
The sample output is just checking for matrix shape, so i shouldn’t look at that for proper validation.
the unit test cases had some error:
initially the test cases for checking the weights was failing which i had fixed in normalization layers where the proper Q1 -query had to be passed.
but this part is very hard i dunno which part of output im making a mistake.
its telling to check for padding mask, but in decoder we are not passing padding mask anywhere only for future tokens mask we re giving.
wher do we need to add this padding ?
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 0.517524 1.1657772 -0.1501512 -1.53315 ]
Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 0.5381926 1.1552632 -0.16051239 -1.5329431 ]
TIA
can you cross check your codes for scaled dot product attention with the below link comment
if everything was recalled as per stated in the comment and still throws error or failed test, then kindly send screenshot of grade cell by personal DM.
Regards
DP
my scaled dot product attentions has passed all test cases and ive followed all the steps as per the previous function for scaled dot product.
Is there anything else i can check before sending the code?
I have figured this out, final residual links and padding mask has to be added in second MHA layer.