Attention model video formula error?

salih-g · December 11, 2021, 1:42pm

In the Attention model video, Andrew NJ shows the representation of alpha (attention) and context.

Background:

its a french translation
french sequence timestep represented by t’
translated sequence timestep represented by t.

But it doesn’t add up to me that the if sum of all alpha for each t' (t-prime) is 1, and that all alpha is considered in the equation for c^< t >, then how would the context be different for words at other t's (c^<1>, c^<2> etc…).

I guess Andrew misses to mention the window limit for t' chosen for the context perhaps ?

edwardyu · December 13, 2021, 8:51am

Although the sum of is 1, context is composed of the sum of ( a). Intuitively, it’s the weighted sum of a. Each target word has different distribution, thus, different context.

salih-g · December 14, 2021, 10:57am

But as long as it is summed across all of t', the formula doesn’t make sense to me. It is just like passing in a^<t'> directly, without the alpha part. Right?

Because in all the iteration of a, the alpha is always/anyways going to sum up to 1. I hope I’m making sense to you.

edwardyu · December 15, 2021, 2:21am

Suppose we’ve an example as below.
Source sentence: Jane visite l'Afrique en septembre
Target sentence: Jane visits Africa in September

We calculate the attention context for each target word as below.

C<Jane> = α<1,1>a<1> + α<1,2>a<2> + ... + α<1,5>a<5>
C<visits> = α<2,1>a<1> + α<2,2>a<2> + ... + α<2,5>a<5>
C<Africa>= α<3,1>a<1> + α<3,2>a<2> + ... + α<3,5>a<5>
C<in> = α<4,1>a<1> + α<4,2>a<2> + ... + α<4,5>a<5>
C<September> = α<5,1>a<1> + α<5,2>a<2> + ... + α<5,5>a<5>

constrain: for each context.

which also means
C<Jane> != C<visits> != C<Africa> != C<in> != C<September>

Did you see each context C<t> has different α sequence? For instance,

"""
[α<3,1>, α<3,2>, ..., α<3,5>] is probably  [0.01, 0.15, 0.82, 0.01, 0.01] for C<Africa>
[α<5,1>, α<5,2>, ..., α<5,5>] may be [0.002, 0.003, 0.005, 0.16, 0.83] for C<September>
"""

It’s because Africa might pay more attention to l'Afrique, and September might focus on septembre.
Does it convince you the context of each target word is different, even though the summation of alpha is 1?

Topic		Replies	Views
Week 3- Attention Model: attention formula may be misleading Sequence Models coursera-platform	2	529	August 27, 2021
Attention Weights Issue in C5 W3 Neural Machine Translation Assignment (Section 3.1) Sequence Models week-module-3 , coursera-platform	10	141	January 23, 2025
Course 5 week 3 quiz question 6 Sequence Models coursera-platform	6	570	July 9, 2022
Attention Models Lecture Sequence Models coursera-platform	4	516	March 29, 2023
Understanding of basic Attention code NLP with Attention Models week-module-1	3	555	August 13, 2023

Attention model video formula error?

Related topics