Hi there
I have just watched the video “Generating Text with Transformers” and I have a few questions about different stages of the process.
Let’s suppose the input is “What colour is the sky at sunset?”
I understand this sentence will be tokenized. Each token is mapped into a vector which is then summed up to its positional encoding and passed to the heads, each head being responsible to process the vectors according to their own specialty.
And here I’m unclear of what happens. In the previous video it’s said that “The output of this layer is a vector of vector logits proportional to the probability score for each and every token in the tokenizer dictionary.” Does it mean each head will attach a score to each vector, which can make a vector hold hundreds of scores? Also, are the heads somehow adding metadata or whatever that instructs the model into translating the input to Spanish, rather than answering “orange”, for example?
Then, the video says:
At this point, the data that leaves the encoder is a deep representation of the structure and meaning of the input sequence.
What is this data like? Is it a collection of vectors, or a single, multidimensional vector?
Now, about the decoder:
This representation is inserted into the middle of the decoder to influence the decoder’s self-attention mechanisms.
What do you mean by that? Is it used to filter which heads will be used in decoding?
Another question about the decoder:
I can understand if the decoder selects the most probable next token in the case of a question-answer process, as it’s generating a brand new answer. But in the case of a translation, will it make a sort of token-by-token mapping, picking the most probable equivalent in the target language? Or is it more like “hey, here’s the context provided by the encoder so you have an idea of what we’re talking about. Now go and follow your guts to create an equivalent in Spanish?”
(sorry to use such non-technical phrasing)
Another question: how does the model know when to stop? Is it when the the most probable next token is a period/question mark/etc?
Thank you!