Attention Model and QKV

I think I understand Attention model as explained in the course video. But most other material I found mention a model with QKV matrices.

I see some similarities between the two models, but they don’t completely match up. Are they two different approaches in applying the attention concept?

If you’re asking about the DLS C5 W3 lecture that first introduces Attention, it would be worth just “holding that thought” and continuing into the W4 lectures. There Professor Ng will describe the full version of Attention that is used in implementing Transformer models. That includes the Q, K, V inputs that you mention. Here’s a slide from the “Self-Attention” lecture in W4 where you can see that he makes the distinction between “RNN Attention” (which he described in W3) and “Transformers Attention”:

1 Like