Transformer Architecture

arvyzukai · May 22, 2024, 5:06am

Let me add to Deepti’s post.

That is true - we need to initially represent the text data somehow.

That is also true - we create richer embeddings on the initial embeddings.
In other words, we take the initial embeddings and “make them richer” (when “looking” at the context surrounding them). The encoder block always adds “something” to initial embeddings to arrive to final embeddings.

Also note, that attention is not “everything” there is to it. In the encoder block there is also a Feed Forward Layer (FFW) which “decides” what to add to the initial embeddings after the attention is “finished” adding its part.

Cheers

P.S. you might also find this more detailed post helpful

Topic		Replies	Views
Conceptual Questions about Transformers Sequence Models coursera-platform	13	678	April 23, 2023
I can't quite understand the transformer structure NLP with Sequence Models week-module-4	8	1056	August 25, 2023
Transformer Encoder Block tl.Mean NLP with Attention Models week-module-3	5	551	May 31, 2023
Transformers EncoderLayers, Multi-Head attention or Self-Attention? Sequence Models coursera-platform	1	1073	July 5, 2021
Intuition behind Transformer = Attention + CNN Sequence Models coursera-platform	1	601	June 7, 2021

Transformer Architecture

Related topics