Need to understand scaled_dot_product_attention function in Transformer

Vikas_Sri · July 13, 2025, 6:02am

I am not sure about this function and the steps involved in solving it. Can some explain the dimension vectors like q, k, v so that I get a better grasp.

coursera-platform
dl-ai-learning-platform

Alireza_Saei · July 14, 2025, 9:13am

Hi @Vikas_Sri

q (query), k (key), v (value) are all vectors derived from the input. If your input has shape (batch_size, seq_len, hidden_dim), then q, k, v typically also have shape (batch_size, seq_len, d_model) after linear projections. Understanding this helps when applying dot products and softmax in self-attention.

Hope it helps! Feel free to ask if you need further assistance.

Topic		Replies	Views
Week4, assignment, scaled_dot_product_attention() Sequence Models coursera-platform	2	416	September 25, 2023
C5-W4-A1 Understanding dimensions in the scaled-dot-product-attention Sequence Models coursera-platform	2	589	March 23, 2023
C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention Sequence Models week-module-4 , coursera-platform	3	340	March 5, 2024
Course5 Week4 exercise 3-self attention Sequence Models coursera-platform	2	625	September 10, 2021
Self-Attention formula Sequence Models week-module-4 , coursera-platform	1	155	May 1, 2024

Need to understand scaled_dot_product_attention function in Transformer

Related topics