C5_W4_A1_Transformer_Subclass_v1_Encoder

Hello,
I’m currently a bit lost in trying to use the MultiHeadedAttention.
My code currently looks like this:

# START CODE HERE
# mentor edit - code removed
# END CODE HERE

But I’m getting the error:
AssertionError: Wrong values when training=True

I tried reading the hints. But I don’t know where to the the values Q,V and K to pass to the mha. I thought the mha was supposed to calculate these internally.

For self-attention, you use ‘x’ three times (for all three of the Q, V, and K matrices). And you pass the “mask” parameter. Do not use the training argument there.

In encoder_layer_out, use “out1”, not “attn_output”.

Thank you. It works now.