The first instruction says:
- You will pass the Q, V, K matrices and a boolean mask to a multi-head attention layer. Remember that to compute self -attention Q, V and K should be the same.
But how do i get Q, V, K matrices. they are not included in the def call(self, x,…) parameters?
I keep running self_attn_output = self.mha(…) but i keep getting error messages.
Any help?