Why use Encoder-Decoder Models?

mcecil82 · February 1, 2024, 10:13pm

I’m watching the wk 1 video on “pre-pretraining large language models”

She states that encoder-decoder models are good for translation or summarization. But can’t those tasks be done with a decoder-only model? Why use an encoder-decoder model then?

paulinpaloalto · February 1, 2024, 10:34pm

If you want to understand that level of detail, it might be a better idea to take DLS Course 5 or NLP Courses 3 and 4. That’s where you learn the technical underpinnings of LLMs. In the short courses they are just showing you how to apply them or build apps on top of them.

The TL;DR version is that you are “encoding” the inputs into an “embedding space” and then decoding that result “outward” into the end result that you want. So think of the encoding step as figuring out what the input says or distilling it into the relevant meaning for whatever the output is that you eventually want. Then the decoding phase takes that “distilled meaning” and maps it to your desired output interpretation of that.

mcecil82 · February 1, 2024, 10:42pm

I understand that, but can’t decoder only models accomplish the same task? like GPT-4 can do translation, question answering, summarization. It just does it in a different way, by predicting next token. e.g. “how to you say hello in spanish?”, the predicted next token in a well-trained decoder model would be “hola”.

paulinpaloalto · February 1, 2024, 11:04pm

Evidently the Encoder/Decoder strategy works better in some respect otherwise they wouldn’t use it. Perhaps it’s easier to train or perhaps it gives better results in general. One can imagine that the dual architecture is a lot more flexible, meaning that you could use that to generate translation into multiple output target languages.

So maybe the real answer here is that I’m not the right person to be answering this question. Sorry I should not have waded in on this thread, since I am not a mentor for this course.

Also note that GPT-4 is the very latest LLM architecture, so it is based on Transformers and Attention Models and those all use the Encoder/Decoder architecture. I have not studied the internals of any of the published GPT models specifically, but have taken DLS Course 5 which covers Sequence and Attention Models. If you are just using GPT-4 through some chat interface, then yes, it looks like a straight decoder style in terms of the results, but the point is that what’s actually happening “under the covers” is all based on Transformers and the Encoder/Decoder architecture is fundamental to how they work.

Topic		Replies	Views
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-module-1	1	738	July 27, 2023
If GPT is a decoder only model, why is it good at tasks other than text generation? Generative AI with Large Language Models week-module-1	3	1595	April 24, 2024
If GPT is decoder only architecture, how do they do classification task and vice-versa? GenAI with LLMs Resources	2	1284	August 10, 2023
How does next word prediction work for language translation? Generative AI with Large Language Models week-module-1	1	729	July 20, 2023
Sequence to sequence vs autoregressive models Generative AI with Large Language Models week-module-1	3	1200	July 18, 2023

Why use Encoder-Decoder Models?

Related topics