Understanding C4_W1_Ungraded_Lab_1_Basic_Attention (Part deux)

I am trying to understand the code in C4_W1_Ungraded_Lab_1_Basic_Attention, notably the concatenation in “def alignment”

I adapted a picture from the class to capture what is happening mathematically.
Is this picture correct? And what happened to the Q, K, and V things?

Hi @ajeancharles

As far as I understand, we are not really taking ‘tensor product’ but are simply concatenating the two matrices for feeding as input to attention mechanism’s neural network.

Shape of encoder_state = (input_size, hidden_size) i.e 5 hidden outputs for each of the input token, and, that of decoder_state is (1, hidden_size) i.e input hidden vector corresponding to a single input token from any of previous time steps. For merging the two matrices along column axis (axis = 1), we just want same number of rows in both matrices (we accomplish that by np.repeat())

QKV, is just another type of attention mechanism which is different from this one in the way that it uses simple dot product for computing similarity between queries and keys instead of the neural network :slight_smile:

Hi @Rahul_Ohlan ,

From the point of view of Abstract Algebra and Category theory, this concatenation is one possible definition of a tensor product as a special subset of the Cartesian Product.

Thank you anyway. Your response is helping me understand the way the team thinks.

Much regards!