Transfomer-Self Attention Wq,Wk,Wv columns what means


I can understand the fact that the row of Wq,Wv,Wk is the same dimension of the input embedding word.

However, I am not sure what the columns of Wq, Wv, and Wk mean.

I’ve googled hard, but I can’t find an answer.

Reference: The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. (jalammar.github.io)

Lecture Notes | Coursera

help me

As always in this course, the W? values are the weights that are learned through training.

1 Like

I’m confused about the concept of linear transform in linear algebra, so I think I asked a silly question about the weight matrix. Thank you for answer.