dk is the dimension of the keys dk means shape of keys (k)??
d_k refers to the dimension of the key inside the multihead attention block. If the embedding dimension of the model is 512 and there are 8 attention heads, then, d_k=\frac{512}{8}=64