Problem with transformer

nguyen1207 · May 28, 2023, 4:51am

I have watched 2 videos Transformer Overview (first video) and Transformer Decoder (second video). The decoders in the 2 videos are completely different from each other.

This is the decoder from the first video:

This is the decoder from the second video:

Why do these decoders are different? The decoder from the second video seems to be the same as the encoder but with the linear and the softmax layer at the end.

In the assignment, we try to build the transformer like this figure:

But the transformer mentioned in the first video is this one:

Again, why are these 2 transformers different?

arvyzukai · May 28, 2023, 6:21am

Hi @nguyen1207

This picture:

comes from the original paper - “Attention Is All You Need” (page 3). This architecture was originally created for translation (the left side inputs the language you want to translate from, the right inputs - the translation “so far”). This picture is often used when talking about transformers.

But the idea can be applied not only for translation - in this week (C4 W2) the transformer was applied to summarize . So as a result we do not use the left side (inputs in another language) - we only use the right side (slightly modified for our purpose).

Cheers

Topic		Replies	Views
General Understanding of Transformer Encoder and Decoder blocks NLP with Attention Models week-3	7	812	January 22, 2024
Transformer decoder architecture in course 2 NLP with Attention Models week-2	11	501	April 30, 2024
Something is wrong in the Decoder Block (of the Week2 ): Contradiction with the paper "Attention is all you need" NLP with Attention Models week-2	6	699	January 31, 2022
Pretraining decoder-only models on sequence modelling NLP with Attention Models week-3	1	425	August 21, 2023
Transformer Architecture NLP with Sequence Models week-4	2	221	May 22, 2024

Problem with transformer

Related topics