NMT with Attention Model - modified architecture

Anand_Kumar3 · August 27, 2024, 10:11am

In the modified architecture, why the inputs and outputs are used again, outputs seems to be there for the sake of second level decoder but couldn’t figure out the reason for input to be used again while preparation for attention, is it used as K in one place and V in another instance ?

lukmanaj · August 27, 2024, 10:24am

Yes that’s it. The inputs are used again in the preparation for attention because they act as keys and values in the attention mechanism, allowing the model to determine which parts of the input should be focused on for each generated output token.

Deepti_Prasad · August 27, 2024, 10:38am

No @Anand_Kumar3

Q, k, v are used to create matrix in encoder, attention and decoder with respect to the input fed and create a vector dimensions values to other inputs (other inputs are tokens or sequence other than the inputs fed).

But attention mechanism what gives added significance is it focus on the input fed from the encoder and mask other inputs/tokens/sequence to focus on the target in attention mechanism. So when the input is fed in decoder, it tries to refer to this attention mechanism which is more focused towards target provided, giving better translation.

Such techniques are helpful when it comes long sequence or long sentence, where attention mechanism helps decoder to focus to sequences targeted in the attention mechanism.

Feel free to ask if any doubt.

Regards
DP

Topic		Replies	Views
How does pre attention works during Inference? NLP with Attention Models week-module-1	5	600	October 9, 2023
Why do we need the pre-attention decoder? NLP with Attention Models week-module-1	8	631	October 11, 2023
Self-attention in the Transformer Network Sequence Models week-module-4 , coursera-platform	7	83	August 15, 2024
Understanding Transformer Network Sequence Models coursera-platform	1	558	July 29, 2021
Problems Interpreting the Query, Key and Value matrices: NLP with Attention Models week-module-1	2	1005	December 13, 2022

NMT with Attention Model - modified architecture

Related topics