In the Attention model video, Andrew NJ shows the representation of alpha (attention) and context.
Background:
- its a french translation
- french sequence timestep represented by t’
- translated sequence timestep represented by t.
But it doesn’t add up to me that the if sum of all alpha for each t'
(t-prime) is 1, and that all alpha is considered in the equation for c^< t >, then how would the context be different for words at other t
's (c^<1>, c^<2> etc…).
I guess Andrew misses to mention the window limit for t'
chosen for the context perhaps ?