I can't quite understand the transformer structure

arvyzukai · August 24, 2023, 1:46pm

I would offer you to go through a very similar recent thread and in particular to try to understand this picture (of Scaled Multi-Head Dot Product Attention):

In my head, it is somewhat similar to the lstm-based attention:

The main difference that I see is that in the lstm-based encoder the hidden-states are the result of lstm network, while the “hidden states” of the transformer’s encoder are the dot products of each token transformed embeddings.

Anyways, if you hard understanding the first picture, feel free to ask question about specific sections.

Cheers

Topic		Replies	Views
Transformer Architecture NLP with Sequence Models week-4	2	221	May 22, 2024
Help! I still don't understand how transformer works! Sequence Models coursera-platform	3	480	August 4, 2023
Week 4 Positional Encoding Sequence Models week-4 , coursera-platform	5	272	April 18, 2024
How magical is the Transformer NLP with Attention Models week-2	4	613	January 29, 2022
Comparing RNN like models and Transformers Sequence Models coursera-platform	5	425	August 15, 2023

I can't quite understand the transformer structure

Related topics