Help! I still don't understand how transformer works!

WONG_Lik_Hang_Kenny · August 3, 2023, 8:02am

This is the last slide for Deep Learning Specialization Course 5 Week 4, I still do not understand a lot of details including how the positional embedding works, how the equations make it work… Can anyone give me a more detailed explanation on this please!!! Thanks…

Nydia · August 3, 2023, 8:50am

Hi @WONG_Lik_Hang_Kenny
in transformers we do not have knowledge of the order of the inputs. Since they operate on self-attention mechanisms (as in Varswani et al., 2017) that don’t inherently consider the permutation of the data. The positional embeddings are necessary to provide how the information is located in the model.

Step 1. Form the embedding layer; this is an embedding layer that converts tokens (words) into high-dimensional vectors. Each token gets mapped to a vector in an embedding space.
Step 2. Form the positional encoding; in this step, the positional embeddings are then added to the token embeddings. In order to capture the sequential information of each token’s position in the input sequence.
Step 3. Combine tokens and positional embeddings; here the token embeddings and the corresponding positional embeddings are element-wise summed to produce the final embeddings that carry both the token’s semantic meaning and its permutation or positional information.

WONG_Lik_Hang_Kenny · August 3, 2023, 9:17am

Thanks for your reply!!! But I still don’t get how the sin and cos equations work. How does that map each word to their corresponding positional encoding vector?

carlosrl · August 4, 2023, 5:10am

Hi @WONG_Lik_Hang_Kenny , the comment from Nydia is correct and I would like to help you with more details.
Please take a look at this video that shows how transformers works peace by peace.
Keep learning!

Topic		Replies	Views
Week 4 Positional Encoding Sequence Models week-module-4 , coursera-platform	5	280	April 18, 2024
I can't quite understand the transformer structure NLP with Sequence Models week-module-4	8	1043	August 25, 2023
Week 4: Transformer network Sequence Models coursera-platform	2	534	October 5, 2021
Positional encoding intuition NLP with Attention Models week-module-2	1	239	February 8, 2024
Doesn't positional encoding create noise in embedding(features) of word? NLP with Attention Models week-module-2	1	573	September 24, 2022

Help! I still don't understand how transformer works!

Related topics