In the video on multi-head attention, there seems to be a typo in the calculation of q, v and k at 1:49.
The slide shows that q2 = W1q * q2, while it should be q2 = W1q * x2, to be in conformity with the previous video.
Could you confirm that it is indeed a typo in the slide ?
My mistake. I mistook the Wq, Wk and Wv matrices used to compute q, k and v vectors (q=Wq.x, k=Wk.x and v=Wv.x), and the Wq1, Wk1 and Wv1 matrices used in the multi-head attention.
I suppose that you calculate q, k and v using the same matrices through heads, and then you multiply the result with a matrix specific for each head to differentiate.
Thus for head 1 : q1 = Wq1.Wq.x and for head 2 : q2 = Wq2.Wq.x