I passed Q1, enc_output, padding_mask and set return_attention_scores=True to mha2. I might be wrong here and got error below. Any advice?
Well, the shape of the variables being added at multi_attn_out2 is not compatible, so you have to see whee they are originating from.
In the comments it says:
apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
# BLOCK 2
# calculate self-attention using the Q from the first block and K and V from the encoder output.
# Dropout will be applied during training
# Return attention scores as attn_weights_block2 (~1 line)
apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
So you have to use block1 and block2 attention outputs, not the weights!
Thanks. I can solve it now and can proceed.
1 Like