Questions about transformer architecture

davidoraj · October 8, 2024, 3:04am

Week 1
generative-ai-with-llms/lecture/R0xbD?t=220

It was mentioned that models like GPT, Llama etc use the decoder only architecture models. (a) How does it work without the context provided by the encoder? (b) What is the context used for?
We learnt that multi head attention is to assign random weights to generate token associations with some meaning/relevance. What is the difference between the multi-head attention (in encoder) vs the masked multi-heat attention (in decoder)?

gent.spah · October 8, 2024, 6:31am

For the 1st:
Unlike encoder-decoder models, which use an encoder to process and understand the input context before generating output, decoder-only models rely on the context built from the input sequence itself.

The second:
The key difference is the use of masking in the decoder to ensure that future tokens are not visible, preserving the autoregressive nature of text generation.

But I would suggest to do the Natural Language Specialization here to understand in depth.

Topic		Replies	Views
I don't understand the transformer's decoder Generative AI with Large Language Models week-module-1	2	199	July 24, 2024
Attention is all you need paper discussions - Transformers Generative AI with Large Language Models	4	343	June 28, 2024
Pretraining decoder-only models on sequence modelling NLP with Attention Models week-module-3	1	445	August 21, 2023
Mask Multi Head Attention Sequence Models coursera-platform	5	622	May 2, 2022
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1692	February 1, 2024

Questions about transformer architecture

Related topics