Question on the meaning of $d_k$

I have a quick question regarding to the meaning of d_k. IIUC, d_k is the length of the key vector, and it is different from the embedding dimension. In UNQ_C1, I passed all tests with depth = query.shape[-1], but is it supposed to be query.shape[-2], as query.shape[-1] is the embedding dimension?

Thanks a lot!

Hi @wywang

That is not true: d_k is depth/dimension of the queries and keys.

So the “length of key vectoris the embedding dimension. The length of the k sequence is the key.shape[-2] but the “dimensionality” over which scaling is done is over key.shape[-1].