Clarification of definitions in transformer model

StuHaze · December 17, 2021, 12:54pm

I’m a bit confused by the explanation of the transformer model. Can someone clarify the following:

How are the words x represented? Are they one-hot vectors or word embeddings?
Is there an error in the explanation of the multi-headed model? I thought Q=W^QX. Why are the inputs to the multi-headed model WQ? I suspect this should be W^QX or just Q with some subscripts to denote the head number. Same goes for K and V

StuHaze · December 17, 2021, 9:22pm

I had a look at the Vaswani paper. They talk about the WQ, WK, WV inputs to the mulit-headed model as projections of Q, K, V. I think I understand now, so the multi-headed version isn’t just a self-attention model with more channels, but there’s an additional layer that expands Q,K,V to add an extra dimension.
I’m guessing the words, x, can be represented in any way.

Topic		Replies	Views
Course 5 Week 4 - Transformer Networks mechanics Sequence Models coursera-platform	1	507	April 21, 2022
A question of Transformer Sequence Models coursera-platform	1	492	December 3, 2021
C5W4 Transformer multi-head weight matrices Sequence Models coursera-platform	4	824	June 30, 2022
Questions about Transformer W_Q, W_K and W_V Sequence Models coursera-platform	1	633	May 10, 2022
W4 A1 number of heads vs Sequence Models week-module-4 , coursera-platform	1	13	December 29, 2024

Clarification of definitions in transformer model

Related topics