That is my code:
def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
“”"
Forward pass for the Decoder Layer
Arguments:
x -- Tensor of shape (batch_size, target_seq_len, fully_connected_dim)
enc_output -- Tensor of shape(batch_size, input_seq_len, fully_connected_dim)
training -- Boolean, set to true to activate
the training mode for dropout layers
look_ahead_mask -- Boolean mask for the target_input
padding_mask -- Boolean mask for the second multihead attention layer
Returns:
out3 -- Tensor of shape (batch_size, target_seq_len, fully_connected_dim)
attn_weights_block1 -- Tensor of shape(batch_size, num_heads, target_seq_len, input_seq_len)
attn_weights_block2 -- Tensor of shape(batch_size, num_heads, target_seq_len, input_seq_len)
"""
# START CODE HERE
# enc_output.shape == (batch_size, input_seq_len, fully_connected_dim)
# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1.
# Dropout will be applied during training (~1 line).
# (batch_size, target_seq_len, d_model)
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask, return_attention_scores=True)
# apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
Q1 = self.layernorm1(mult_attn_out1 + x)
# BLOCK 2
# calculate self-attention using the Q from the first block and K and V from the encoder output.
# Dropout will be applied during training
# Return attention scores as attn_weights_block2 (~1 line)
mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True) # (batch_size, target_seq_len, d_model)
# apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1) # (batch_size, target_seq_len, fully_connected_dim)
#BLOCK 3
# pass the output of the second block through a ffn
ffn_output = self.ffn(mult_attn_out2) # (batch_size, target_seq_len, fully_connected_dim)
# apply a dropout layer to the ffn output
ffn_output = self.dropout_ffn(mult_attn_out2, training=training)
# apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block
out3 = self.layernorm3(ffn_output + mult_attn_out2) # (batch_size, target_seq_len, fully_connected_dim)
# END CODE HERE
return out3, attn_weights_block1, attn_weights_block2
I’m getting the following error:
AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 DecoderLayer_test(DecoderLayer, create_look_ahead_mask)
~/work/W4A1/public_tests.py in DecoderLayer_test(target, create_look_ahead_mask)
180 assert np.allclose(attn_w_b1[0, 0, 1], [0.5271505, 0.47284946, 0.], atol=1e-2), “Wrong values in attn_w_b1. Check the call to self.mha1”
181 assert np.allclose(attn_w_b2[0, 0, 1], [0.32048798, 0.390301, 0.28921106]), “Wrong values in attn_w_b2. Check the call to self.mha2”
→ 182 assert np.allclose(out[0, 0], [-0.22109576, -1.5455486, 0.852692, 0.9139523]), “Wrong values in out”
183
184
AssertionError: Wrong values in out
What am I doing wrong?