Attention model video formula error?

In the Attention model video, Andrew NJ shows the representation of alpha (attention) and context.

Background:

• its a french translation
• french sequence timestep represented by t’
• translated sequence timestep represented by t.

But it doesn’t add up to me that the if sum of all alpha for each `t'` (t-prime) is 1, and that all alpha is considered in the equation for c^< t >, then how would the context be different for words at other `t`'s (c^<1>, c^<2> etc…).

I guess Andrew misses to mention the window limit for `t'` chosen for the context perhaps ?

Although the sum of is 1, context is composed of the sum of ( a). Intuitively, it’s the weighted sum of a. Each target word has different distribution, thus, different context.

But as long as it is summed across all of `t'`, the formula doesn’t make sense to me. It is just like passing in `a^<t'>` directly, without the `alpha` part. Right?

Because in all the iteration of `a`, the `alpha` is always/anyways going to sum up to 1. I hope I’m making sense to you.

Suppose we’ve an example as below.
Source sentence:` Jane visite l'Afrique en septembr`e
Target sentence:` Jane visits Africa in Septembe`r

We calculate the attention context for each target word as below.

``````C<Jane> = α<1,1>a<1> + α<1,2>a<2> + ... + α<1,5>a<5>
C<visits> = α<2,1>a<1> + α<2,2>a<2> + ... + α<2,5>a<5>
C<Africa>= α<3,1>a<1> + α<3,2>a<2> + ... + α<3,5>a<5>
C<in> = α<4,1>a<1> + α<4,2>a<2> + ... + α<4,5>a<5>
C<September> = α<5,1>a<1> + α<5,2>a<2> + ... + α<5,5>a<5>
``````

constrain: for each context.

``````which also means
C<Jane> != C<visits> != C<Africa> != C<in> != C<September>
``````

Did you see each context `C<t>` has different `α` sequence? For instance,

``````"""
[α<3,1>, α<3,2>, ..., α<3,5>] is probably  [0.01, 0.15, 0.82, 0.01, 0.01] for C<Africa>
[α<5,1>, α<5,2>, ..., α<5,5>] may be [0.002, 0.003, 0.005, 0.16, 0.83] for C<September>
"""
``````

It’s because `Africa` might pay more attention to `l'Afrique`, and `September` might focus on `septembre`.
Does it convince you the context of each target word is different, even though the summation of alpha is 1?

2 Likes