Can input to self-dot attention contain anything more except embeddings?

Can you explain me what you ment there?

The attend function receives Query and Key. As a reminder, they are produced by a matrix multiply of all the inputs with a single set of weights. We will describe the inputs as embeddings assuming an NLP application, however, this is not required.

I believe here is meant the application might not be NLP, could be also like computer vision for example.

Does computer vision use self-dot attention too? :open_mouth: