[week 4] Transformer Network - get_angles

I also found this confusing and was quite frustrated before I found this post. The instructions do not tell us to repeat the columns to use one for sine and one for cosine, so I had no idea what was going wrong.

In addition, IMO, the //2 structure is inelegant because it requires redundant data storage and is generally complicated. Seems to me like a simpler way to implement this would be-

  1. make an angle_rads table without duplicate columns
  2. use
    np.concatenate([[np.sin(x),np.cos(x)] for x in angle_rads]).T
    to get the sines and cosines next to each other
    (this is pseudocode I didn’t test it)

But perhaps I am misunderstanding part of the logic behind the //2 idea?

Thanks for providing such elaborate information @manifest!
I’ll update it into my post above.

Hi @zer2, actually there is a reminder under <1.2 - Sine and Cosine Positional Encodings>, saying
Reminder: Use the sine equation when 𝑖i is an even number, and the cosine equation when 𝑖 is an odd number.

Please check about it.

IMHO, the confusion comes from this slide

Notice that in this slide, notion "i"s in the upper part (i=0, i=1…) and in the lower part (PE(pos, 2i), PE(pos,2i+1)) are actually different. As many confusing students, I plugged the “i” from the upper part into the lower part computation and wasted like 1 hour without understanding what’s going on.

Clearly we can do it better by seperating notions in this slide.

1 Like

@Damon true, I did not notice that! However it is past “get_angles” in the assignment, and I did not think to read ahead to get hints about how to do earlier cells. Perhaps that comment or something like it could be brought up to the heading under 1.1?

@zer2, yes it may be better to put it under 1.1.

And it would be much more better if we see the connection between get_angles & positional_encoding, since that’s exactly why we need get_angles.

I got confused for a while, but you’re being misled by the I, which is just for odd and even indices。 That’s the same thing as dealing with odd sines and even indices with cosines。Because we’re crossing sines and cosines, so we have to divide by 2, and we have this many groups。For example, let’s say d is equal to 512,pos=1

1 Like

Hi, thanks for your helpful tip! I wrote the steps you just mentioned in code. but I’m getting this error:
Wrong shape. We expected: (1, 8, 16)

This is what we mean by explaining in simple terms! :clap:

1 Like

I got confused originally, I didn’t see what was wrong with an approach like what @zer2 mentioned above. But this comment here by @liangyuantong helped to clarify.
10000 must be raised to values of i as in a sequence like 0, 2, 4, etc. , which is possible but may not be direct.

Still, it’s easy to make an array of angles like that, and then make an array of sine and cosine values (like pair-wise columns) for the angles.

Or did I still miss something about the necessity of having redundant columns?

Hi @jincy-p-janardhanan, I don’t really get what your question is.

Could you please state it more directly ?
Thanks.

What exactly is the need for having redundant pair-wise columns in angles?
Why not make a matrix of angles with rows like:
1/1000^{0/512}, 1/1000^{2/512}, ...
And then form the matrix having sin and cos of each column in angles as adjacent columns?

1 Like

@jincy-p-janardhanan Check this:

1 Like

I totally agree! I have to admit that this has put me off. I was used to have very clear explanations of things in this specialization, despite the heavy notations. This one is the most confusing thing I faced.
That being said, I realize it’s not easy to simplify this kind of complex ideas. I think it would help a lot to separate the i from the angle formula with the i which is the coordinate of the encoding vectors

1 Like

Thanks @Damon your answer helped clarify on this.
Maybe it would be even clearer to use the actual formula that explains the origin of that otherwise “magical” i // 2 :

image

where image is the coordinate of the encoding, and image is the floor of the number.
That’s where the i we see in the formula actually comes from :

image

Then the i which is given as argument to the get_angles function would actually be image in the formula above

4 Likes

Good job @morningdew, that’s more direct and precise.

For the sake of consistency of notation with the lecture and code lab, we can push it a little further:

pe

The team @manifest may consider to update this formula if it’s practical to do that.

2 Likes

@manifest @morningdew . Sheesh! This is really a good start for beginners who have no idea what is going on under the hood, with at the least a small hint to start off with. I was breaking my head for at-least half an hour on this.

thanks for the explanation :yum:

1 Like

I think formula (3) of this exercise should be updated or removed altogether from the exercise. The fact that in the left handside of formula (1) and (2) the indices are i and i+1 respectively changes the whole meaning of how to interpret and code the inner part. Taking it out of context in (3) and requesting the students to implement it as it is written there is not just misleading but outright incorrect.

1 Like

The answer to that question comes from these equations:
image
Note that the sub index of PE en each case correspond to even or odd numbers, where the i in the argument of each function is the same, then for the first pair (\sin, \cos) i is 0, then 1 for the second pair, etc…