DLS 5, Week 4, Exercise C6 "Wrong values in out"

I’m trying to execute the week 4 programming exercise , but I’m stuck with this following error

182 assert np.allclose(out[0, 0], [-0.22109576, -1.5455486, 0.852692, 0.9139523]), “Wrong values in out”

AssertionError: Wrong values in out

That’s what I’m trying to execute, but something’s wrong and I’m not able to figure out what.

# apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block
        out3 = self.layernorm3(ffn_output + mult_attn_out2)  # (batch_size, target_seq_len, fully_connected_dim)
        # END CODE HERE

Can someone please help me out here ?

That line of code is OK.
So probably the issue is with your calculation of ffn_output or mult_attn_out2.

Here’s how I’m calculating mult_attn_out2 and ffn_output
# MHA > DROPOUT > NORMALIZATION
mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)
mult_attn_out2 = self.dropout_ffn(mult_attn_out2, training=training)
mult_attn_out2 = self.layernorm2(mult_attn_out2 + mult_attn_out1)

    # Fully Connected > Dropout
    ffn_output = self.ffn(mult_attn_out2)  
    ffn_output = self.dropout_ffn(ffn_output, training=training)

Isn’t this right ?

No, that’s not correct.
There is no dropout layer in mult_attn_out2.

1 Like

I’m getting a similar error, and my code seems to match yours, except I didn’t include a dropout layer in mult_attn_out2 (as already noted by TMosh). Any suggestions? My error, by the way, is:

180 assert np.allclose(attn_w_b1[0, 0, 1], [0.5271505, 0.47284946, 0.], atol=1e-2), “Wrong values in attn_w_b1. Check the call to self.mha1”
→ 181 assert np.allclose(attn_w_b2[0, 0, 1], [0.32048798, 0.390301, 0.28921106]), “Wrong values in attn_w_b2. Check the call to self.mha2”
182 assert np.allclose(out[0, 0], [-0.22109576, -1.5455486, 0.852692, 0.9139523]), “Wrong values in out”

1 Like

I cant say without looking at your full implementation of the decoder.

I was having the same problem, and I also did not use a “dropout” layer in mult_attn_out2.
In my case, the important part was:

“# apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)”
mult_attn_out2 = self.layernorm2( ? + ? )

In the internal sum, “the output of the first block” does not refers to “mult_attn_out1”.
It refers to the last output of that block.

I hope it might help.

2 Likes

Dear Nuwanda7,

For Block 2, there’s a clear instruction to calculate self attention using ‘Q’ from the first block. You have mentioned Q1 while returning the attention scores, but I see it missing while normalizing the function for mult_attn_out2. This is what making the difference when you are trying to retrieve the value in the ffn_output.

2 Likes

I struggled with the same issue. This helped!

Glad to hear that Daniel!

Happy Learning!

This helps tremendously!

1 Like