When I was watching the self attention video in the transformers section , I just got a doubt that how do we calculate query(q), key(k) and value(v) for each word ?
They are learned when the embedding is created.
1 Like