Conceptual questions about encoder / decoder from the "Generating text with transformers" video

costab06 · April 12, 2024, 11:52am

What are the details of the embedding of the tokens? When I previously used Word2Vec, the dimensions of the embedding were conceptually the co-occurance of the words in the text, and the dimensions were dictated by the size of the vocabulary. This seems different - the number of dimensions doesn’t seem related to the tokenization. What are the dimensions and how are the embeddings generated? Can someone explain or provide a pointer?
What are the dimensions of the output of the Encoder that is fed into the Decoder? In the example they pass 3 tokens into the encoder, and if the output of the enoder is the logits for each token, are the dimensions of the output “num tokens” x “vocabulary size”? Or something else? How is that matrix used in the Decoder to influence the self-attention weights?

Thanks for your help.

gent.spah · April 13, 2024, 7:02am

You can check the Natural Language Specialization to understand this more in depth. Or some other resource.

Topic		Replies	Views
Support with C4W1 assignment - NLP with attention models NLP with Attention Models feedback , week-module-1	3	229	June 19, 2025
Token, Encoder an decoder meaning Generative AI with Large Language Models week-module-1	1	436	July 1, 2023
W4 Assignment 1 Exercise 8 Are the input dimensions of our transformer model correct Sequence Models week-module-4 , coursera-platform	2	254	January 9, 2024
LSTM and encoder_layers NLP with Attention Models week-module-1	6	680	August 14, 2023
Encoder blocks dimension NLP with Attention Models week-module-3	3	525	August 9, 2022