The dimension of W

Should the dimension of Wq,Wk,and Wv be n * n when n equals to the number of the sequence?

Hi @Amazing_Patrick

No, they should be (emb_dim x emb_dim).

It’s a common misconception with sequence models. You can ask yourself what would happen with your model at inference time when you start generating first word?