Question about attention slides

Hello everyone! I’m doing the week 4 of the Sequence models course, and I’m confused about where to apply matrices W in attention models.

In the second video (“Self-attention”), matrices W are multiplied against x^. But in the third video (“Multi-head attention”), matrices W are multiplied each one against q, k and v.

Which is the right one?

Thanks in advance!

This lesson is under review by the course staff. I believe the Self-attention material is more correct than the MHA material.

Perhaps the staff will publish an update on this.