I just finished C4_W1_Ungraded_Lab_2, but dont fully understand.
How come that keys and values are the same embedded_en in the function call at the end? attention_qkv(embedded_fr, embedded_en, embedded_en)
If someone could give me a hint on how to get my thinking straight here, I would appreciate it
Because you apply “attention” on the Values (the last part). In other words, you compare the first two, then apply the result on the last one.
You can imagine the process like:
comparing embedded_fr with embedded_en (by dot product Q \cdot K^ \intercal) you get the similarity score (alignment). If the embeddings in french match embeddings in english - you get high score, and vice versa.
apply the similarity score on embedded_en (by score \cdot V), the last matrix in the parenthesis.
In lecture video (around 4:15) you can see that black-grey-white matrix. This is the result of (by dot product Q \cdot K^ \intercal) - the dot product of first two (embedded_fr, embedded_en), then this matrix is dot product with the last one (embedded_en) and the result is applied attention on embedded_en embeddings. This result is used to translate the sentence (usually there are more steps depending on the architecture).