Hello everyone! I’m doing the week 4 of the Sequence models course, and I’m confused about where to apply matrices W in attention models.
In the second video (“Self-attention”), matrices W are multiplied against x^. But in the third video (“Multi-head attention”), matrices W are multiplied each one against q, k and v.
Which is the right one?
Thanks in advance!