If GPT is a decoder only model, why is it good at tasks other than text generation?

ShivaPandurangi · April 23, 2024, 6:50pm

As per lesson “Pre-training large language models”, decoder only LLMs are pre-trained with the objective of Text Generation, encoder only models are pre-trained with objective of Sentiment Analysis etc. and encoder-decoder based LLMs are pre-trained with the objective of Translation, Summarization etc.

My understanding is that it means decoder only LLMs are good at Text Generation and bad at Translation, Summarization, Sentiment Analysis and other tasks. Similarly encoder-decoder LLMs are good at Translation, Summarization etc. but bad at Text Generation, Sentiment Analysis etc. Same for encoder only LLMs.

However, GPT being a decoder only LLM performs excellently on all tasks like Text Generation, Translation, Summarization, Question-Answering, Sentiment Analysis etc. How is that possible for a decoder only LLM?

Link to lession: https://www.coursera.org/learn/generative-ai-with-llms/lecture/2T3Au/pre-training-large-language-models

gent.spah · April 24, 2024, 4:32am

This is a very nice response to this questions which also helped me at least remember some those things:

Abbasid · April 24, 2024, 5:58am

so basically using decoder only architecture allowed them to train with alot more unlabeled data, which gave the model’s better “understanding” ability to the inputs, am I correct?

gent.spah · April 24, 2024, 6:35am

Yes the transformer attention architecture also enables that!

Topic		Replies	Views
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1584	February 1, 2024
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-module-1	1	721	July 27, 2023
If GPT is decoder only architecture, how do they do classification task and vice-versa? GenAI with LLMs Resources	2	1217	August 10, 2023
Sequence to sequence vs autoregressive models Generative AI with Large Language Models week-module-1	3	1090	July 18, 2023
Gpt instead of flan t5 Generative AI with Large Language Models week-module-1	2	431	September 19, 2023

If GPT is a decoder only model, why is it good at tasks other than text generation?

Related topics