Question about attention slides

cmescobar · August 20, 2022, 11:52pm

Hello everyone! I’m doing the week 4 of the Sequence models course, and I’m confused about where to apply matrices W in attention models.

In the second video (“Self-attention”), matrices W are multiplied against x^. But in the third video (“Multi-head attention”), matrices W are multiplied each one against q, k and v.

Which is the right one?

Thanks in advance!

TMosh · August 25, 2022, 5:15am

This lesson is under review by the course staff. I believe the Self-attention material is more correct than the MHA material.

Perhaps the staff will publish an update on this.

Topic		Replies	Views
Clarification regarding attention and self attention Sequence Models	4	591	August 22, 2021
Course 5 - Week 4 - A1 - Exercise 4 - EncoderLayer Sequence Models week-4	2	26	August 13, 2024
C5W4 Multi-head attention Sequence Models	4	692	May 10, 2023
Course 5 Week 4 - Transformer Networks mechanics Sequence Models	1	500	April 21, 2022
C5W4 Transformer multi-head weight matrices Sequence Models	4	807	June 30, 2022

Question about attention slides

Related topics