Problem with transformer

I have watched 2 videos Transformer Overview (first video) and Transformer Decoder (second video). The decoders in the 2 videos are completely different from each other.

This is the decoder from the first video:

This is the decoder from the second video:

Why do these decoders are different? The decoder from the second video seems to be the same as the encoder but with the linear and the softmax layer at the end.

In the assignment, we try to build the transformer like this figure:

But the transformer mentioned in the first video is this one:

Again, why are these 2 transformers different?

Hi @nguyen1207

This picture:

comes from the original paper - “Attention Is All You Need” (page 3). This architecture was originally created for translation (the left side inputs the language you want to translate from, the right inputs - the translation “so far”). This picture is often used when talking about transformers.

But the idea can be applied not only for translation - in this week (C4 W2) the transformer was applied to summarize . So as a result we do not use the left side (inputs in another language) - we only use the right side (slightly modified for our purpose).