I have watched 2 videos Transformer Overview (first video) and Transformer Decoder (second video). The decoders in the 2 videos are completely different from each other.
This is the decoder from the first video:
This is the decoder from the second video:
Why do these decoders are different? The decoder from the second video seems to be the same as the encoder but with the linear and the softmax layer at the end.
In the assignment, we try to build the transformer like this figure:
But the transformer mentioned in the first video is this one:
Again, why are these 2 transformers different?