Scaled_dot_product_attention

xharles · June 4, 2021, 10:31am

According to the attention definition:

the dot product of Q and K^{T} should be scaled down by square root of d_k.
However after got d_k from the k.ndim, I found that only divide d_k instead of its square root could pass the unit test. Could this be attribute to my implementation details, or the unit test itself?

martink91 · June 4, 2021, 3:13pm

hi xharles,

I think .ndim is not the right property for this excercise as this gives you the dimension of the np array in terms of “how many axis does this tensor have”. → the value 2 → it’s a 2-dimensional array. But what you need is rather “how long is the dimension of that axis”, so .shape property might be helpful for you.
I hope this hint helps you solving

P.s. the sqrt of the right solution is by coincidence equal to .ndim, but that only applies for the unittest example

Topic		Replies	Views
Scaled_dot_product_attention q, k, and v dimensions not correct Sequence Models	4	447	July 21, 2023
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4	6	51	September 6, 2024
Purpose of sqrt(dim(k)) in Scaled dot product attention NLP with Attention Models week-1	3	894	November 19, 2021
Course 5 Week 4 scaled_dot_product_attention() Sequence Models	3	712	September 8, 2021
C4W2_Assignment in Natural Language Processing with Attention NLP with Attention Models week-3	2	47	September 2, 2024

Scaled_dot_product_attention

Related topics