I ran the test cases and got this:
Failed test case: Wrong values in x.
Expected: [1.6461557, -0.7657816, -0.04255769, -0.8378165]
Got: [ 1.627879 -0.94895595 -0.00853747 -0.67038566]
Failed test case: Wrong values in outd when training=True.
Expected: [1.6286429, -0.7686589, 0.00983591, -0.86982]
Got: [ 1.5818391 -0.40260813 -0.02375562 -1.1554755 ]
Failed test case: Wrong values in outd when training=True and use padding mask.
Expected: [1.390952, 0.2794097, -0.2910638, -1.3792979]
Got: [ 1.601717 -0.23741467 -0.20914236 -1.1551601 ]
I followed the same format as the encoder function, and implemented the look ahead mask and padding mask in the loop as well. However, I am struggling to figure out what is wrong with my code.
Hi @Drew_Murray
I’m not sure that I understood you correctly, so just in case - you do not use the look ahead mask in the encoder layer, only the padding mask.
In the Decoder though you do use the look ahead mask in the first multi-head attention block.
In the second multi-head attention block of the Decoder you do use the padding mask.
Mask could not be the only problem since the calculations involve many steps but I think the Instructions and code hints are pretty clear for this exercise, you just need to pay very close attention what they are saying.
Maybe you have some particular doubts that your code could be failing?
So I have already passed all of the tests for exercise 2 - DecoderLayer(), but for the Decoder(), where I have to initialize the decoder with the embedding layer, positional encoding and the multiple decoding layers. I used the Encoder() as a reference for Decoder() to initialize the embedding layer and positional encoding, then passed the look ahead and padding masks into decoding layers in the loop. So, I am struggling to find what line or lines of code did I make my mistake at. It could be a really small mistake that I am overlooking.