Understanding of basic Attention code

someone555777 · July 18, 2023, 6:18pm

And few additions.

Do I understand correct, that we compute aligment and attention in this lab only for one of predicted words in translation (decoder_state). But usually we need to compute this attention for each next word. So, it will be multiple calls of attention(encoder_states, decoder_state), where encoder_states will be different (different output words), but encoder_states are the same.

изображение1331×724 135 KB
We have hidden_size. If I understand correct, it is words embeddings size of one word. And after this we transform it through linear regresstion and tanh. We get activations with attention_size columns, that contains classification. But what is it at all? How have we got 5x10 matrix from 5 words with 16x2 word embeddings size? And why we didn’t multiply 5 on 2 btw?
As I understand we need tanh to stricktly dedicate what words in input sentence are connected with generated output. And we get approximate numbers form alignment() func. But after this we do softmax… Why have we not done softmax after computation of activations in aligment? Looks like that I don’t fully understand sense of alignment() at all

Topic		Replies	Views
How does attention work NLP with Attention Models course-related , week-module-1 , conceptual-question	1	255	May 1, 2024
Dimension of weight matrices NLP with Attention Models week-module-1	1	489	December 5, 2022
Understanding the attention model in the assignment NLP with Attention Models week-module-1	2	361	September 8, 2023
W1 seq2seq lecture question NLP with Attention Models week-module-1	8	290	March 15, 2024
Video: NMT Model with Attention NLP with Attention Models week-module-1	5	382	December 21, 2023