When creating a post, please add:
- Week # must be added in the tags option of the post.
- Link to the classroom item you are referring to: https://www.coursera.org/learn/generative-ai-with-llms/lecture/R0xbD/generating-text-with-transformers
- Description (include relevant info but please do not post solution code or your entire notebook):
First of all, the encoder as I understand it is like this.
The sentence “i am a boy” goes into the input of the encoder, tokenization → vector → multihead attention → vector values change → feed forward network → token comes out with well-established context.
The encoder as I understand it is above.
And now the decoder part is the problem.
The decoder initially receives SOS as input. And I don’t quite understand the masked multi-head attention here.
At the beginning of the decoder, there is only one sos token in the masked multihead attention layer. How do mask and predict the next value?
How does masked multi-head attention occur with just one SOS token?