What does seq2seq mean in Transformer?

When creating a post, please add:

There’s something I don’t quite understand yet.
seq2seq is a pre-transformer model. But why does the lecture refer to the transformer encoder-decoder model as seq2seq? I don’t really understand. Please explain.

Here is a nice article about seq2seq models:

Hello @Goomin

Your query although comes from GenAI with LLMs course but seems more specific to NLP specialisation where it explains sequence to sequence models are trained to convert a sequence of input data (such as a sentence in one language) into a sequence of output data (such as a sentence in another language).

The architecture of a seq2seq model typically consists of two main parts: an encoder and a decoder.

The encoder takes the input sequence and converts it into a fixed-length vector representation, often referred to as the context vector or hidden state. The decoder then takes this context vector and generates the output sequence, step by step.

Sequence to Sequence models is a special class of Recurrent Neural Network architectures that we typically use (but not limited) to solve complex Language problems like Machine Translation, Question Answering, creating Chatbots, Text Summarization.

Where as in the pre-transformer, Word2Vec and GloVe were the two main methods for producing dense embeddings. These methods have a one-to-one mapping between a word and its embedding representation.

Feel free to ask if you have more doubt!!!

Regards
DP

1 Like