Help! I still don't understand how transformer works!

This is the last slide for Deep Learning Specialization Course 5 Week 4, I still do not understand a lot of details including how the positional embedding works, how the equations make it work… Can anyone give me a more detailed explanation on this please!!! Thanks…

Hi @WONG_Lik_Hang_Kenny
in transformers we do not have knowledge of the order of the inputs. Since they operate on self-attention mechanisms (as in Varswani et al., 2017) that don’t inherently consider the permutation of the data. The positional embeddings are necessary to provide how the information is located in the model.

  • Step 1. Form the embedding layer; this is an embedding layer that converts tokens (words) into high-dimensional vectors. Each token gets mapped to a vector in an embedding space.
  • Step 2. Form the positional encoding; in this step, the positional embeddings are then added to the token embeddings. In order to capture the sequential information of each token’s position in the input sequence.
  • Step 3. Combine tokens and positional embeddings; here the token embeddings and the corresponding positional embeddings are element-wise summed to produce the final embeddings that carry both the token’s semantic meaning and its permutation or positional information.

Thanks for your reply!!! But I still don’t get how the sin and cos equations work. How does that map each word to their corresponding positional encoding vector?

Hi @WONG_Lik_Hang_Kenny , the comment from Nydia is correct and I would like to help you with more details.
Please take a look at this video that shows how transformers works peace by peace.
Keep learning!

1 Like