Question about MultiHeadAttention layer

Let me think about this one more time :slight_smile: