DLS Course 5 Week 4 Exercise 4

The first instruction says:

  1. You will pass the Q, V, K matrices and a boolean mask to a multi-head attention layer. Remember that to compute self -attention Q, V and K should be the same.

But how do i get Q, V, K matrices. they are not included in the def call(self, x,…) parameters?

I keep running self_attn_output = self.mha(…) but i keep getting error messages.

Any help?

Since this is self-attention, you use the ‘x’ variable for all three of those.

It worked! Thank you!