If GPT is a decoder only model, why is it good at tasks other than text generation?

As per lesson “Pre-training large language models”, decoder only LLMs are pre-trained with the objective of Text Generation, encoder only models are pre-trained with objective of Sentiment Analysis etc. and encoder-decoder based LLMs are pre-trained with the objective of Translation, Summarization etc.

My understanding is that it means decoder only LLMs are good at Text Generation and bad at Translation, Summarization, Sentiment Analysis and other tasks. Similarly encoder-decoder LLMs are good at Translation, Summarization etc. but bad at Text Generation, Sentiment Analysis etc. Same for encoder only LLMs.

However, GPT being a decoder only LLM performs excellently on all tasks like Text Generation, Translation, Summarization, Question-Answering, Sentiment Analysis etc. How is that possible for a decoder only LLM?

Link to lession: https://www.coursera.org/learn/generative-ai-with-llms/lecture/2T3Au/pre-training-large-language-models

1 Like

This is a very nice response to this questions which also helped me at least remember some those things:

1 Like

so basically using decoder only architecture allowed them to train with alot more unlabeled data, which gave the model’s better “understanding” ability to the inputs, am I correct?

1 Like

Yes the transformer attention architecture also enables that!

1 Like