Should the dimension of Wq,Wk,and Wv be n * n when n equals to the number of the sequence?
No, they should be (emb_dim x emb_dim).
It’s a common misconception with sequence models. You can ask yourself what would happen with your model at inference time when you start generating first word?