Need to understand scaled_dot_product_attention function in Transformer

I am not sure about this function and the steps involved in solving it. Can some explain the dimension vectors like q, k, v so that I get a better grasp.

  • coursera-platform
  • dl-ai-learning-platform

Hi @Vikas_Sri

q (query), k (key), v (value) are all vectors derived from the input. If your input has shape (batch_size, seq_len, hidden_dim), then q, k, v typically also have shape (batch_size, seq_len, d_model) after linear projections. Understanding this helps when applying dot products and softmax in self-attention.

Hope it helps! Feel free to ask if you need further assistance.

1 Like