Comparing the models for W2 and W3

arvyzukai · November 16, 2023, 11:00am

Hi @PZ2004

That is correct, this assignment uses the decoder-only approach for summarization.

That is false. Bert is the encoder only.

Usually, the performance you achieve. You can try both approaches and see what is best for you (computation wise, accuracy, etc.).

Usually, the summarization is done with encoder and decoder transformers (not like the one in the assignment). In that case (encoder-decoder) the encoder gets the input of the text, and the decoder outputs only the summary part.

In C4W1, we were presented with the encoder-decoder translation, where English text was input for the encoder and German text was the target (the decoder’s job). Similarly, we could have used the same architecture to input the text/document into the encoder, and ask the decoder to output the summary. But, I guess, the course creators for this (next, C4W2) week wanted to introduce the decoder-only architecture (like in the GPT) and the way to implement the summarization with it.

In the C4W2 Assignment the special token is used to separate the text from the summary. And also the mask is used to not penalize the model for not getting the text part correct or wrong, so only the summary part is important. As I mentioned, this is decoder-only model.

In the C4W3 Assignment we implement the encoder-only model (actually, one part (the Unsupervised denoising part) of the T5 which is actually the encoder-decoder model). So, in the Assignment the model is trained to predict only sentinels. You can find out more about the T5 in the paper or maybe more concretely here.

Cheers

Topic		Replies	Views
Input for Text Summarization NLP with Attention Models week-2	5	552	November 8, 2022
W4 - Assignment: Why do we only update the attention weights in the decoder, but not in the encoder? Sequence Models	2	534	December 2, 2022
Transformer decoder architecture in course 2 NLP with Attention Models week-2	11	382	April 30, 2024
Concatenation of input with summary NLP with Attention Models week-2	2	324	December 2, 2023
If GPT is decoder only architecture, how do they do classification task and vice-versa? GenAI with LLMs Resources	2	1063	August 10, 2023

Comparing the models for W2 and W3

Related topics