Week 4 Encoder Layer

I am having a problem in understanding the code for the encoder layer. In order to compute the self attention, it is mentioned 1. You will pass the Q, V, K matrices and a boolean mask to a multi-head attention layer. Remember that to compute self -attention Q, V and K should be the same." How do I calculate Q, V and K?

General tips for this function:

  • For self-attention, Q, K, and V are all the same - the ‘x’ variable. You need to use it three times.
    ‘mask’ is provided as a function parameter, you need to pass that also.
  • For dropout1, you need to also pass “training=training”.
  • For out2, you need to use out1. not attn_output.

Also, please edit your message to remove the code. Posting your code isn’t allowed by the course Honor Code.


Note: this post may no longer be correct, since the assignment has been modified recently.