NMT Week-1 Assignment - Training

jyadav202 · January 6, 2024, 6:39am

Hi!

You are right about how decoder is used in training and prediction time.

In the code, during training, pre-attention decoder hidden state of LSTM are not passed to the attention mechanism. This is because we are using the shifted-right target sequence to attend the correct next token. This target sequence passes through the pre-attention decoder and only the output of the LSTM along with the encoded context is used in attention mechanism as query and value respectively.

I am referring a previous discussion on this. FYI, here implementation in paper is discussed which is slightly different than the TF code.

Topic		Replies	Views
Video: NMT Model with Attention NLP with Attention Models week-module-1	5	382	December 21, 2023
C4W1_Assignment - Translator NLP with Attention Models week-module-1	2	389	March 20, 2024
C4W1_Assignment exercise 3 - decoder NLP with Attention Models week-module-1	4	281	May 24, 2024
NMT with Attention Model NLP with Attention Models	2	392	January 2, 2024
W1 seq2seq lecture question NLP with Attention Models week-module-1	8	290	March 15, 2024

NMT Week-1 Assignment - Training

Related topics