Embedding * W1 + pos_encoding * W2 in positional encoding lab

DavidBetterFellow · April 15, 2024, 12:35pm

Hi
I just finished the ungraded lab about positional enconding and I have a question regarding the weights that multiply embedding vector and positional encoding vector.

So, on the lab we have
embedding * W1 + pos_encoding[:,:,:] * W2

I would like to confirm something

So, we should not adjust the weight of positional encoding too much, right? Otherwise, the embedding vectors will be dominated and lose their semantic quality?

And, actually, I am quite interested in W1 and W2 here in terms that recently I have been using a huggingface transformer, and I haven’t encountered hyper params that allow me to adjust the weight of positional encoding vector
Does anyone have some reference for this?

paulinpaloalto · April 15, 2024, 4:43pm

I have not tried to look for any external references about positional encodings and have never worked with the HuggingFace code, but I think the W1 and W2 values that they use in this ungraded notebook are just experiments to give you a sense of the different effects and meaning of the word embeddings versus the positional encodings. By overemphasizing one or the other, you get to see the effect and meaning of the values.

But now look at what happens when they actually apply positional encodings in the real graded Transformers Lab assignment this week: they just add the values with no weight factor in both the Encoder and the Decoder.

So my conclusion is that the experiments shown here in this ungraded lab have nothing to do with how positional encodings are actually used.

rmwkwok · April 15, 2024, 10:19pm

Agree with Paul. W1 and W2 are for this ungraded lab to show us the effect, but they don’t exist as hyperparemeters in real transformers.

DavidBetterFellow · April 16, 2024, 3:42am

Thanks both of you for the explanation. This clear a lot of thing for me.

Topic		Replies	Views
Doesn't positional encoding create noise in embedding(features) of word? NLP with Attention Models week-module-2	1	570	September 24, 2022
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models coursera-platform	4	562	June 4, 2025
Scaling the Embedding Outcome in the Encoder Sequence Models coursera-platform	1	747	July 7, 2022
Positional Embeddings Sequence Models week-module-4 , coursera-platform	2	77	June 20, 2024
Week 4 Positional Encoding Sequence Models week-module-4 , coursera-platform	5	276	April 18, 2024

Embedding * W1 + pos_encoding * W2 in positional encoding lab

Related topics