In the self attention discussion it is mentioned that: Q = Wx, presumably a dot product of W and x where W is W superscript (Q).
In Multi head discussion, the term WQ is discussed (W superscript (Q)). Is the W in this presentation just the inverse of the W matrix presented in self attention?
i.e. multiply Q = Wx by Winv (often written as W superscript -1).
leading to
WinvQ = x, since WinvW = 1
i.e. more specifically since np.dot(Winv, W) = 1
Hi Tom,
I just surmised they might be, since multiplying by the inverse would eliminate the W from the right side of the equation leaving it simply multiplying the Q on the left side of the equation.
Thanks, David