If GPT is decoder only architecture, how do they do classification task and vice-versa?

himsgpt · August 7, 2023, 8:34am

I am slightly confused b/w encoder only and decoder only differentiation.
If Bert is encoder only, how do they produce summarization and if GPT is decoder only how do they do classification tasks.

Also, encoder only architecture requires input and output length to be same. How come they do classification task?

Thank you!

Juan_Olano · August 7, 2023, 12:59pm

Hi @himsgpt ,

Interesting questions.

In all cases you will see that to achieve these tasks you have to:

Add additional layers to the standard model to accomplish the task
Make sure that the training is appropriate for the task at hand.

Lets go one by one:

How can an encoder-only model do summarization?
An encoder-only model is great to extract the main features of an input. At the end of the encoder, we have a vector that has captured the “meaning” of the input. To convert this into a summarization you will need to add more layers to process this vector into the summary. Some common layers would be feed-forward layers followed by an auto-regressive model.

How can a decoder-only model do classification?
Although decoder-only models are best fit for a sequence-to-sequence task, they can be adapted also to perform a classification. To do this, you need to add a classification head to the end of the model, that is, some more layers that can handle the output of the model, pass it through some feed-forward layers and finish with a softmax layer. You also have to train the model for this task.

How can an encoder-only model, which requires input and output length predefined, do classification?
Similar to the previous case, you can process the output layer of the encoder by passing it through a classification layer that will usually include a softmax. Also, you need to train the model for the task of classification.

You may say: But GPT-3.5/GPT-4 can do it without all of these extra layers and steps! And you are right. You can have GPT-3.5/GPT-4 to do classification and so many other things. And this is because of the size of the model. These models are very powerful and can handle very well many tasks, and that’s what has us all so in awe! But for models of 7B, 13B and such size, which you can manage and train and fine-tune in “normal” computers, you need to consider the steps above.

Final comment:
As you can see the models can be adapted to perform tasks that are, lets say, not natural for them. Getting the models to execute these tasks can be done but you may not get the best results. The best results can be achieved by using the ideal models for each task. But this proves the strength of the transformers!

Thoughts?

himsgpt · August 10, 2023, 4:36am

Thanks @Juan_Olano for the comprehensive answer.
That does makes sense, and it seems any model architecture can be tweaked to perform any sets of tasks but as you suggested, we should use the model for the task they are designed for.

One thought, while I see a lot of leaderboards on LLM benchmark, none of it(in my knowledge) covers the comprehensive task list, that can act as a Playbook. It would be good to include comprehensive tasks ranging from classification to summarization to text generation for each models/architecture.

Topic		Replies	Views
If GPT is a decoder only model, why is it good at tasks other than text generation? Generative AI with Large Language Models week-module-1	3	1524	April 24, 2024
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-module-1	1	721	July 27, 2023
Transformer decoder architecture in course 2 NLP with Attention Models week-module-2	11	511	April 30, 2024
Gpt instead of flan t5 Generative AI with Large Language Models week-module-1	2	431	September 19, 2023
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1584	February 1, 2024

If GPT is decoder only architecture, how do they do classification task and vice-versa?

Related topics