In Self-Attention, the equation for calculating the attention of a word goes: [Capture] My question is - doesn’t the value of this vector thus computed, not give any information about which words it’s actually weighting? The next block only receives output from this one, so how would it be able t…

Self-Attention Summation and Information Loss

Course Q&A Deep Learning Specialization Sequence Models

TMosh July 8, 2021, 5:19am 2

I recommend you review the Self-Attention lecture in Week 4, starting around 4:40. The query, key, and value matrices are all learned from the training set of ‘x’ examples. Each of them has its own learned weight matrix.

C5W4A1 Understanding Self-Attention

Topic		Replies	Views
C5W4A1 Understanding Self-Attention Sequence Models week-module-4 , coursera-platform	2	383	February 25, 2024
Summation in self-attention Sequence Models coursera-platform	3	580	September 17, 2021
C5W4 Quiz: Self-attention Sequence Models coursera-platform	5	851	July 12, 2022
Self-Attention formula Sequence Models week-module-4 , coursera-platform	1	174	May 1, 2024
C5W4 - In need of attention regarding 'multi-head attention' Sequence Models week-module-4 , coursera-platform	5	154	May 30, 2024

Self-Attention Summation and Information Loss

Related topics