I got the following error with this code:
# START CODE HERE
# calculate self-attention using mha(~1 line)
attn_output = self.mha(x) # Self attention (batch_size, input_seq_len, embedding_dim)

Besides the call function has a ‘mask’ parameter that you need to use somewhere… well the place to use it is exactly in that layer. Check the parameters of the MultiHeadAttention to find where to use it.

I am also having a hard time with this exercise. Passing mask as an argument (attn_output = self.mha(x, mask)) gives me the following error
“InvalidArgumentError: cannot compute Einsum as input #1(zero-based) was expected to be a int64 tensor but is a float tensor [Op:Einsum]”

Reading the documentation I see that we need to pass two parameters to MultiHeadAttention. Number of heads I understand should be 3. One for each Q, V, and K. Then key dimension is the dimension of the K? Just passing integers into this function also gives an error. I am very confused now on how to use this function.

self.mha(…) requires four parameters.
The Q, V, and K parameters, and the mask.
All three of Q, V, and K are the ‘x’ variable, since this is self-attention.