Course5 Week4 exercise 3-self attention

mjcampbell · September 10, 2021, 7:25pm

For the programming assignment “Transformer Networks”, in exercise 3 “Self Attention” can you provide more details for the function " scaled_dot_product_attention"?

In particular, what is the structure of the formula

softmax(Q K^T / sqrt(dk) + M ) V
is the softmax being taken across rows or across columns - over what dimension?
are the queries rows or columns of Q?
what is V (rows or columns of vectors v_i ?)
how is softmax() multiplied by V (matrix product or Kronecker product)? what are all of the dimensions
what should the output look like? what are the dimensions of the output of the calculation

It would help enormously to have a unit test below it with all of the details and vectors as in previous weeks. This made it much easier to decipher these important details, which are not specified in this exercise.

To that end, can you provide the actual code of the unit tests for exercise 3, and exercises below it instead of just the single function calls that are there?

mjcampbell · September 10, 2021, 9:21pm

I found the answer in this article:

TMosh · September 10, 2021, 10:04pm

You can see the unit tests by opening the “public_tests.py” file from your Notebook.

Topic		Replies	Views
Self-Attention formula Sequence Models week-4 , coursera-platform	1	155	May 1, 2024
C5-W4-A1 Understanding dimensions in the scaled-dot-product-attention Sequence Models coursera-platform	2	588	March 23, 2023
Having trouble understanding the Attention Layer NLP with Attention Models week-1	6	568	December 6, 2022
C5_W4_A1_Transformer_Subclass_v1 Exercise 4 Sequence Models coursera-platform	3	414	September 26, 2023
Week 4 - Ungraded Lab 1 - What exactly is happening in Part 3.4? Particularly in Step 5? NLP with Attention Models week-4	7	562	June 16, 2023

Course5 Week4 exercise 3-self attention

Related topics