Self-Attention Summation and Information Loss

I recommend you review the Self-Attention lecture in Week 4, starting around 4:40. The query, key, and value matrices are all learned from the training set of ‘x’ examples. Each of them has its own learned weight matrix.