C5_W4: How to retrieve K, V from the enc_output input parameter in DecoderLayer implementation?

Second MultiHeadAttention needs to retrieve K and V from the enc_output input parameter which has shape (1, 3, 4) - (batch_size, input_seq_len, embedding_dim). How to retrieve them? How to interpret this output shape of the Encoder? Where is K and V in that shape? Logically, I would expect it to have the first dimension with size 2 so that I could index into the output to retrieve K and V, say enc_output[0] = 'K' and enc_output[1] = 'V'?

The instructions are a bit ambiguous there. It turns out you don’t need to “parse” the enc_output: you just pass that full value as both the key and value arguments to the second MHA layer.

One way to interpet that is to say that Q, K and V are not necessarily the same in a fully general Attention Layer, but in a lot of cases they are the same in the particular way that Transformers invoke Attention Layers.

1 Like