C5_W4_A1_Transformer_Subclass_v1 -- Use case for V != K?

The code works fine. However, I want to get some additional understanding.

In

UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

self.mha1(x, x, x, attention_mask=…

  • This makes sense, as “Remember that to compute self -attention Q, V and K should be the same.”

self.mha2(Q1, enc_output, enc_output, attention_mask=…

  • Here enc_output comes from class Encoder.call.
  • V and K are the same.
  • I was wondering about a use case/architecture when they are different.

I am not aware of one.