[week 4] Transformer Network - get_angles

zer2 · June 22, 2021, 2:10am

I also found this confusing and was quite frustrated before I found this post. The instructions do not tell us to repeat the columns to use one for sine and one for cosine, so I had no idea what was going wrong.

In addition, IMO, the //2 structure is inelegant because it requires redundant data storage and is generally complicated. Seems to me like a simpler way to implement this would be-

make an angle_rads table without duplicate columns
use
np.concatenate([[np.sin(x),np.cos(x)] for x in angle_rads]).T
to get the sines and cosines next to each other
(this is pseudocode I didn’t test it)

But perhaps I am misunderstanding part of the logic behind the //2 idea?

Damon · June 23, 2021, 6:37am

Thanks for providing such elaborate information @manifest!
I’ll update it into my post above.

Damon · June 23, 2021, 6:47am

Hi @zer2, actually there is a reminder under <1.2 - Sine and Cosine Positional Encodings>, saying
“Reminder: Use the sine equation when 𝑖i is an even number, and the cosine equation when 𝑖 is an odd number.”

Please check about it.

tungvs · June 23, 2021, 3:09pm

IMHO, the confusion comes from this slide

Notice that in this slide, notion "i"s in the upper part (i=0, i=1…) and in the lower part (PE(pos, 2i), PE(pos,2i+1)) are actually different. As many confusing students, I plugged the “i” from the upper part into the lower part computation and wasted like 1 hour without understanding what’s going on.

Clearly we can do it better by seperating notions in this slide.

zer2 · June 24, 2021, 2:27am

@Damon true, I did not notice that! However it is past “get_angles” in the assignment, and I did not think to read ahead to get hints about how to do earlier cells. Perhaps that comment or something like it could be brought up to the heading under 1.1?

Damon · June 24, 2021, 6:48am

@zer2, yes it may be better to put it under 1.1.

And it would be much more better if we see the connection between get_angles & positional_encoding, since that’s exactly why we need get_angles.

liangyuantong · June 24, 2021, 6:20pm

I got confused for a while, but you’re being misled by the I, which is just for odd and even indices。 That’s the same thing as dealing with odd sines and even indices with cosines。Because we’re crossing sines and cosines, so we have to divide by 2, and we have this many groups。For example, let’s say d is equal to 512，pos=1

sogolgolafshan · June 24, 2021, 7:42pm

Hi, thanks for your helpful tip! I wrote the steps you just mentioned in code. but I’m getting this error:
Wrong shape. We expected: (1, 8, 16)

jincy-p-janardhanan · June 25, 2021, 7:55pm

This is what we mean by explaining in simple terms!

jincy-p-janardhanan · June 25, 2021, 8:35pm

I got confused originally, I didn’t see what was wrong with an approach like what @zer2 mentioned above. But this comment here by @liangyuantong helped to clarify.
10000 must be raised to values of i as in a sequence like 0, 2, 4, etc. , which is possible but may not be direct.

Still, it’s easy to make an array of angles like that, and then make an array of sine and cosine values (like pair-wise columns) for the angles.

Or did I still miss something about the necessity of having redundant columns?

Damon · June 25, 2021, 10:22pm

Hi @jincy-p-janardhanan, I don’t really get what your question is.

Could you please state it more directly ?
Thanks.

jincy-p-janardhanan · June 25, 2021, 10:53pm

What exactly is the need for having redundant pair-wise columns in angles?
Why not make a matrix of angles with rows like:
1/1000^{0/512}, 1/1000^{2/512}, ...
And then form the matrix having sin and cos of each column in angles as adjacent columns?

Damon · June 25, 2021, 11:32pm

@jincy-p-janardhanan Check this:

morningdew · June 27, 2021, 8:26am

I totally agree! I have to admit that this has put me off. I was used to have very clear explanations of things in this specialization, despite the heavy notations. This one is the most confusing thing I faced.
That being said, I realize it’s not easy to simplify this kind of complex ideas. I think it would help a lot to separate the i from the angle formula with the i which is the coordinate of the encoding vectors

morningdew · June 27, 2021, 9:07am

Thanks @Damon your answer helped clarify on this.
Maybe it would be even clearer to use the actual formula that explains the origin of that otherwise “magical” i // 2 :

where is the coordinate of the encoding, and is the floor of the number.
That’s where the i we see in the formula actually comes from :

Then the i which is given as argument to the get_angles function would actually be in the formula above

Damon · June 27, 2021, 10:35am

Good job @morningdew, that’s more direct and precise.

For the sake of consistency of notation with the lecture and code lab, we can push it a little further:

The team @manifest may consider to update this formula if it’s practical to do that.

RTIGADOLI · July 3, 2021, 2:22pm

@manifest @morningdew . Sheesh! This is really a good start for beginners who have no idea what is going on under the hood, with at the least a small hint to start off with. I was breaking my head for at-least half an hour on this.

Sourav_Ganguly · July 4, 2021, 6:21am

thanks for the explanation

gugger · July 14, 2021, 9:02am

I think formula (3) of this exercise should be updated or removed altogether from the exercise. The fact that in the left handside of formula (1) and (2) the indices are i and i+1 respectively changes the whole meaning of how to interpret and code the inner part. Taking it out of context in (3) and requesting the students to implement it as it is written there is not just misleading but outright incorrect.

jcardona · July 15, 2021, 7:58pm

The answer to that question comes from these equations:

Note that the sub index of PE en each case correspond to even or odd numbers, where the i in the argument of each function is the same, then for the first pair (\sin, \cos) i is 0, then 1 for the second pair, etc…

Topic		Replies	Views
Course 5 Week4 programming assignment #1 Sequence Models	6	794	July 17, 2021
Transformer NB get_angles equations broken? Sequence Models	4	558	February 16, 2022
Week 4: Followed the get_angles() formula but getting error Sequence Models	2	1131	November 17, 2021
C5W4 get_angles Sequence Models	6	749	July 6, 2021
[Week 4] Exercise 1 - get_angles Sequence Models	22	1887	May 23, 2021

[week 4] Transformer Network - get_angles

Related topics