DeepLearning.AI

Short Course Q&A Attention in Transformers: Concepts and Code in Py

Topic		Replies	Views	Activity
Why do we use square root of key dimension for scaling?		4	63	July 12, 2025
The Matrix Math for self-attention		4	73	February 22, 2025
In class MaskedSelfAttention -- don't understand python statement		6	68	February 16, 2025