According to the attention definition:

the dot product of `Q`

and `K^{T}`

should be scaled down by square root of `d_k`

.

However after got `d_k`

from the `k.ndim`

, I found that only divide `d_k`

instead of its square root could pass the unit test. Could this be attribute to my implementation details, or the unit test itself?

hi xharles,

I think .ndim is not the right property for this excercise as this gives you the dimension of the np array in terms of â€śhow many axis does this tensor haveâ€ť. â†’ the value 2 â†’ itâ€™s a 2-dimensional array. But what you need is rather â€śhow long is the dimension of that axisâ€ť, so .shape property might be helpful for you.

I hope this hint helps you solving

P.s. the sqrt of the right solution is by coincidence equal to .ndim, but that only applies for the unittest example

2 Likes