I don't understand the transformer's decoder

Goomin · May 5, 2024, 9:48am

When creating a post, please add:

Week # must be added in the tags option of the post.
Link to the classroom item you are referring to: https://www.coursera.org/learn/generative-ai-with-llms/lecture/R0xbD/generating-text-with-transformers
Description (include relevant info but please do not post solution code or your entire notebook):

First of all, the encoder as I understand it is like this.

The sentence “i am a boy” goes into the input of the encoder, tokenization → vector → multihead attention → vector values change → feed forward network → token comes out with well-established context.

The encoder as I understand it is above.

And now the decoder part is the problem.

The decoder initially receives SOS as input. And I don’t quite understand the masked multi-head attention here.

At the beginning of the decoder, there is only one sos token in the masked multihead attention layer. How do mask and predict the next value?

How does masked multi-head attention occur with just one SOS token?

gent.spah · July 22, 2024, 10:23am

I think its a good idea to check the NLP specialization, especially course 4 which explains the transformer architecture.

Deepti_Prasad · July 24, 2024, 12:26pm

Hi @Goomin

The decoder is also composed of a stack of identical layers.
In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack.
Similar to the encoder, we employ residual connections around each of the sub-layers, followed by layer normalization.
We also modify the self-attention sub-layer in the decoder stack to prevent positions from attending to subsequent positions.
This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.

I am sharing the Transformer pdf. kindly go through it which explains the same part I explained you here. in case you still have doubt, feel free to ask doubt!

Transformer.pdf (2.1 MB)

Regards
DP

Topic		Replies	Views
Questions about Transformer Models Generative AI with Large Language Models week-module-1	2	364	October 23, 2023
General Understanding of Transformer Encoder and Decoder blocks NLP with Attention Models week-module-3	7	821	January 22, 2024
Questions about transformer architecture Generative AI with Large Language Models ai-discussions	1	46	October 8, 2024
Masked Attention Transformers Sequence Models coursera-platform	6	803	September 27, 2024
Transformer Model Decoder Question Sequence Models coursera-platform	1	447	July 15, 2023

I don't understand the transformer's decoder

Related topics