Concatenation of input with summary

Bharath_Mukundakrish · November 6, 2023, 2:57pm

“you will use a language model – Transformer Decoder – to solve an input-output problem. As you know, language models only predict the next word, they have no notion of inputs. To create a single input suitable for a language model, we concatenate inputs with targets putting a separator in between”

It is not clear how summary is generated from same input again and again ? Why is summary tagged along with input ? What is the intuition ? I am unable to understand this. Thanks

arvyzukai · November 8, 2023, 8:03am

Hi @Bharath_Mukundakrish

That is a good question. This week’s task (summarization) is accomplished by decoder-only architecture - meaning, that only the generator is used (not the usual way, where encoder would encode the input, and the decoder would generate the summary).

To achieve summarization with decoder-only architecture, the special token (SEP) is used to separate (and indicate to the model) where the summary begins.

Let’s take a concrete trivial example:
“Very very long sentence with a lot of words <SEP> The summary <EOS>”.
Having this example, the model needs to learn to predict from from the inputs. For example, the model should learn to behave something like this (input → output),

<BOS> → “The” (wrong prediction/penalized)
<BOS> Very → “very” (correct prediction/rewarded)
…
<BOS> Very very long → “time” (wrong prediction/penalized)
<BOS> Very very long sentence → “with” (correct prediction/rewarded)
… etc.
<BOS> Very very long sentence with a lot of words <SEP> → “The” (correct prediction/rewarded)
<BOS> Very very long sentence with a lot of words <SEP> The → “sentence” (wrong prediction/penalized - should have been the word “summary”)
<BOS> Very very long sentence with a lot of words <SEP> The summary → “<EOS>” (correct prediction/rewarded)

So the intuition could be that looping over a gazillion of examples of different sentences, the model learns to predict the correct tokens (“learns” the language and also learns to generate the summary after the special token).

Cheers

P.S. I should have also mentioned that in this week’s Assignment, the mask used is 0 for for all the words before the <SEP> token and 1s after it - so the model is penalized rewarded only on the summary part. This is not a must and a soft-mask (for example, 0.01 mask for the tokens before the <SEP> token and 1s for the summary part) could be used in order for the model to converge faster and more efficiently and, maybe, for even better results.

Bharath_Mukundakrish · December 2, 2023, 6:23pm

Is there a relation between eigen-vectors/eigen-values and the notion of summarization of a given document(s) ? If so, how does the relationship show up in summarization models ?
Also many of the concepts of Attention et. al., to me seem to have analogy in splines/basis functions, blossoming, convolution with basis splines in CAD (Computer Aided Design) World and differential calculus world.

Thanks

Topic		Replies	Views
Input for Text Summarization NLP with Attention Models week-2	5	552	November 8, 2022
Context length for text summarization models NLP with Attention Models week-2	1	511	August 13, 2023
All previously generated tokens as decoder input or only the latest generated token as decoder input NLP with Attention Models week-1	2	29	July 14, 2024
C4W2_Assignment_2_Transformer Summarizer_Exercise 5 - next_word NLP with Attention Models week-4	4	28	January 6, 2025
Comparing the models for W2 and W3 NLP with Attention Models week-3	3	390	November 18, 2023

Concatenation of input with summary

Related topics