NMT with Attention Model

Meghshyam_Prasad · January 1, 2024, 1:15pm

week-1
In C4W1_Assignment, Exercise 3(Decoder) input is encoder output which is of shape 64, 14, 256, so if you give it to Embedding layer with embedding dimension 256, output shape will be 64, 14, 256 256, which can’t be given to LSTM as it gives error, expected input dimension 3, got 4. If we keep the embedding dimension 1 and pass x[:,:,:,0] to pre-atttention layer, there is problem in Cross Attention layer as it complains that “cannot compute Einsum as input #1(zero-based) was expected to be a int64 tensor but is a float tensor [Op:Einsum] name:”. There are no labs in this week which has such code, so totally clueless how to handle this.

arvyzukai · January 2, 2024, 10:08am

Hi @Meghshyam_Prasad

You should not do that - embedding output should not be embedded again, it’s the output that the decoder should use as is (when calculating cross attention and nowhere else).

Cheers

Meghshyam_Prasad · January 2, 2024, 1:02pm

Thank you. I realized it later. I have to send “target” to Embedding layer and not “context”. If there would have been a block diagram showing flow, it would have been easier.

Topic		Replies	Views
C4W1 Assigment - Decoder part - Dimension problem NLP with Attention Models week-module-1	2	35	March 21, 2025
Support with C4W1 assignment - NLP with attention models NLP with Attention Models feedback , week-module-1	3	212	June 19, 2025
C4W1_Assignment exercise 3 - decoder NLP with Attention Models week-module-1	4	281	May 24, 2024
C4W1_Assignment - Translator NLP with Attention Models week-module-1	2	389	March 20, 2024
NMT with attention : Failed test case: w1_unittest.test_encoder(Encoder) NLP with Attention Models week-module-1	9	27	December 19, 2024

NMT with Attention Model

Related topics