- Please see the next point.
- The encoder output we want to pass to the decoder are the modified embeddings of the encoder terms. Look at
class Encoder
for more details regarding this. Here’s a recap:
a. Perform dot product self attention to understand the similarity between encoder terms.
b. Use this information to update embeddings before passing them to the decoder.
1.We need to pass the translated sentence as input during training. The expected output is one step right shifted translated output. This is done so that we can calculate loss for all terms in parallel. Please go back to the lecture(s) to understand look ahead masking for the role it plays in decoder attention. - The weights are used in grading. See
scaled_dot_product_attention_test
for details. - Decoder weights are also used for grading. See
Decoder_test
andTransformer_test
1 Like