NMT with Attention Model

In C4W1_Assignment, Exercise 3(Decoder) input is encoder output which is of shape 64, 14, 256, so if you give it to Embedding layer with embedding dimension 256, output shape will be 64, 14, 256 256, which can’t be given to LSTM as it gives error, expected input dimension 3, got 4. If we keep the embedding dimension 1 and pass x[:,:,:,0] to pre-atttention layer, there is problem in Cross Attention layer as it complains that “cannot compute Einsum as input #1(zero-based) was expected to be a int64 tensor but is a float tensor [Op:Einsum] name:”. There are no labs in this week which has such code, so totally clueless how to handle this.

Hi @Meghshyam_Prasad

You should not do that - embedding output should not be embedded again, it’s the output that the decoder should use as is (when calculating cross attention and nowhere else).


Thank you. I realized it later. I have to send “target” to Embedding layer and not “context”. If there would have been a block diagram showing flow, it would have been easier.