Hello,

I would be grateful if you could please explain why Q, V, and K matrices are all the same ‘x’ matrix. Tx

That’s how self-attention is defined.

thanks - I understand self attention. I was just trying to get my head around the syntax. I guess the weight matrices are not exposed as they are trainable. Also, having completed the assignment, I now understand that the inputs for Q, V and K can differ i.e. in the decoder.