Decoder only model vs encoder+decoder models

Arun_Joy · July 27, 2023, 12:30pm

What makes decoder only models are so successful (like ChatGPT, BARD), when ideally encoder+ decoder model is considered to have more in depth understanding of the meaning of the input?

Juan_Olano · July 27, 2023, 3:26pm

I am not sure I would compare them like that. They are two different models with two different missions, although some tasks may overlap.

The decoder-only ‘guesses’ the next token. That’s all it does.

The encoder-decoder converts one sequence into another sequence.

Someone can say: Yes! But decoder-only can also translate from English to French and this is seq-to-seq! And I would answer: It is seq-to-seq for us humans, but for the decoder-only model was just ‘guessing the most probable next word’.

If I had to build a language translator or any other task to go from one sequence (audio, text, etc) to another sequence (text, other language, etc) I would do it with an encoder-decoder. But yes, big decoder-only models like GPT can do a great job as well, but they have to be huge .

Topic		Replies	Views
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1600	February 1, 2024
Sequence to sequence vs autoregressive models Generative AI with Large Language Models week-module-1	3	1138	July 18, 2023
If GPT is a decoder only model, why is it good at tasks other than text generation? Generative AI with Large Language Models week-module-1	3	1561	April 24, 2024
How does next word prediction work for language translation? Generative AI with Large Language Models week-module-1	1	677	July 20, 2023
If GPT is decoder only architecture, how do they do classification task and vice-versa? GenAI with LLMs Resources	2	1234	August 10, 2023

Decoder only model vs encoder+decoder models

Related topics