Sequence to sequence vs autoregressive models

Claire_Gong · July 18, 2023, 9:44am

In this video - ’ Pre-training large language models’, three types of llms are introduced: autoencoder, autoregressive and seq to seq. It mentions seq to seq is best for translation task and autoregressive is best for text generation.
But GPT-3/chatgpt as a autoregressive model, perform well for many language tasks other than text generation, such as text translation and sentiment analysis. Does the categorization in mentioned video still make sense?
It seems to me seq to seq model is not necessary for application any more (except for training and language modelling), as autoregressive model already can cover its area of expertise.
Please help clarify if my assumption is correct/wrong, thank you.

Juan_Olano · July 18, 2023, 12:15pm

Hi @Claire_Gong ,

Thank you for your insight!

Autoregressive models are really the surprise of the moment in many regards. As the lecture states, these models not only generate text but also the trainer says that new applications are being discover with these models, and this is a field of research.

As you properly point out, translation and sentiment analysis, among others, are some of these other abilities of these models.

Still, I would not support, at least yet, the idea of discarding Encoder-Decoder models for seq-to-seq tasks. Although the Decoders-only models are very good at it, the Encoder-Decoder architecture may have strengths over the Decoder-only. For example: The decoder-only is predicting the next token by only looking at the past tokens, while the encoder-decoder models, in the encoder part, are looking at the entire context and passing Keys and Values to the decoder.

Yes, the decoder-only models are doing an amazing job, but I would argue that the encoder-decoder has strengths, and may be visible to us in some cases, over the decoder-only model.

Thoughts?

Juan_Olano · July 18, 2023, 1:56pm

One more thought:

What you mention of LLMs being able to translate and do other tasks that seq-2-seq do, that is seen mainly (and I would say ‘only’) on the very LLM, think GPT and Claude.

Smaller LLMs are not that good at these tasks, however small Seq-2-Seque are good at this tasks.

So the size of the LLM is a variable to consider. Not everyone can afford to have its own GPT model, but a seq-2-seq for translation, for instance, is doable at a much much lower cost.

Claire_Gong · July 18, 2023, 2:49pm

When size is small the differences make more sense. Thanks for your clarification!

Topic		Replies	Views
If GPT is a decoder only model, why is it good at tasks other than text generation? Generative AI with Large Language Models week-1	3	1523	April 24, 2024
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-1	1	721	July 27, 2023
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-1	3	1582	February 1, 2024
Which model should I use for LLM application AI Discussions	1	174	January 6, 2024
About the Sequence Models category Sequence Models coursera-platform	3	1442	January 1, 2024

Sequence to sequence vs autoregressive models

Related topics