The tokens that decoder block use

DavidBetterFellow · April 15, 2024, 5:26am

Hi, I just finished the lectures on W4, and I have some questions about the decoder part of the transformer

So, I wonder if all the words in the vocabulary are being used in this state when the Q vector gets fed into the second multi-head attention in the picture, or does it just use the token up to the time being generated?

I am pretty confused about this
For example, in Prof. Andrew’s example, “Jane Visit l’Afrique en September,” we have all the tokens right here when we are at the encoder and can compute the dot products Q and K.
But in the decoder, we have each token only when that token is previously generated, like the 1st time step, we only have SOS, and this makes me wonder if the vocabulary words are being used somewhere here to predict “Jane”

I appreciate any help you can provide.

Alireza_Saei · April 15, 2024, 7:10am

Hi @DavidBetterFellow

In the transformer model, during the decoding phase, only the tokens that have been generated up to the current time step are available. At each time step of the decoder, the model predicts the next token based on the previously generated tokens.

The decoder is intentionally restricted from accessing future words, thereby preventing any visibility into forthcoming information.

DavidBetterFellow · April 15, 2024, 7:31am

Thanks your explanation

Alireza_Saei · April 15, 2024, 8:04am

You’re welcome

Topic		Replies	Views
Questions about Transformer Models Generative AI with Large Language Models week-1	2	363	October 23, 2023
Course 5 Week 4 Transformer Padding Mask Sequence Models	2	529	September 22, 2021
Week 4: Transformer Network (test time intuition) Sequence Models	1	516	April 21, 2022
Predicting Next Set of Tokens in Decoder Model Generative AI with Large Language Models week-1	7	575	August 10, 2023
All previously generated tokens as decoder input or only the latest generated token as decoder input NLP with Attention Models week-1	2	32	July 14, 2024

The tokens that decoder block use

Related topics