Transformer Architecture

Hi @NIHARIKA

Let me add to Deepti’s post.

That is true - we need to initially represent the text data somehow.

That is also true - we create richer embeddings on the initial embeddings.
In other words, we take the initial embeddings and “make them richer” (when “looking” at the context surrounding them). The encoder block always adds “something” to initial embeddings to arrive to final embeddings.

Also note, that attention is not “everything” there is to it. In the encoder block there is also a Feed Forward Layer (FFW) which “decides” what to add to the initial embeddings after the attention is “finished” adding its part.

Cheers

P.S. you might also find this more detailed post helpful