How is Self Attention Q=Wx related to multi-head attention WQ

In the self attention discussion it is mentioned that: Q = Wx, presumably a dot product of W and x where W is W superscript (Q).
In Multi head discussion, the term WQ is discussed (W superscript (Q)). Is the W in this presentation just the inverse of the W matrix presented in self attention?
i.e. multiply Q = Wx by Winv (often written as W superscript -1).
leading to
WinvQ = x, since WinvW = 1
i.e. more specifically since, W) = 1

Thanks, David

Sorry, I don’t totally comprehend your notation.
Can you give a link or some screen captures to the part of the lecture you’re referring to?

I did not see a way to link to the slides, since the link just downloads them.
Also, the Q=Wx is only in the video, written in blue ink on the slides.

Q=Wx, etc : Self Attention video at 5:02, bottom right of slide.
Attention(W(q)Q, W(k)K, W(v)V), etc : Multihead Attention at 3:05, middle of slide.


Thanks for the references.

It will be a couple of days before I am able to reply further on this topic (going offline unexpectedly).

OK. I will look forward to your feedback. Thanks.

No, the W matrices aren’t inverses. I don’t think Andrew says that in the lectures.

Reading the original research paper might be helpful in expanding on Andrew’s intuitive explanation.

Hi Tom,
I just surmised they might be, since multiplying by the inverse would eliminate the W from the right side of the equation leaving it simply multiplying the Q on the left side of the equation.
Thanks, David

Hi Tom,
And thank you for the link to the original article.
I’ll dig into that tonight.