C4W2 Exercise 2 DecodeLayer. Output is correct, error in the UnitTest

LevValenzuela · January 17, 2024, 6:15am

Hello, I have an error in exercise 2, DecodeLayer. The issue is in the unittest; I have 2 errors. However, the output of the exercise is correct.

Output:
Using embedding_dim=12 and num_heads=16:
q has shape:(1, 15, 12)
Output of encoder has shape:(1, 7, 8)
Output of decoder layer has shape:(1, 15, 12)
Att Weights Block 1 has shape:(1, 16, 15, 15)
Att Weights Block 2 has shape:(1, 16, 15, 7)

Expected Output
Output:
Using embedding_dim=12 and num_heads=16:
q has shape:(1, 15, 12)
Output of encoder has shape:(1, 7, 8)
Output of decoder layer has shape:(1, 15, 12)
Att Weights Block 1 has shape:(1, 16, 15, 15)
Att Weights Block 2 has shape:(1, 16, 15, 7)

UnitTest:
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [-0.61175704 -0.9107513 -0.14352934 1.6660377 ]

Failed test case: Wrong values in 'out' when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [-0.5599833  -1.0828896   0.05846525  1.5844076 ]

Any suggestions as to why this error might be occurring?

In advance, thank you very much.

arvyzukai · January 17, 2024, 6:37am

Hi @LevValenzuela

The Expected output shows only the dimensions of the output (in that regard your implementation is correct).

What unit test tells is you, that the actual values are not the same as expected (in both cases). In one case the unit test just checks the values without padding and in the other - with padding. In other words, you passed the unit tests for data types, output shapes but not for the actual final output values.

You should very carefully check the code hints and see if your implementation does what you’re asked. Check if you’re using .mha1 vs mha2 (and what arguments they receive), if you apply layer norms (1, 2, 3) where appropriate, etc.

Let us know if you find your mistake.
Cheers

LevValenzuela · January 18, 2024, 2:07am

Hi @arvyzukai

In the first Multi-Head Attention layer (mha1), I utilize the input tensor X, X, X, a look-ahead mask, and return the scores.

In the second Multi-Head Attention layer (mha2), I use Q, enc_output, enc_output, a padding mask, and return the scores.

The normalization and attention weights appear to be correct.

I’m uncertain whether for the enc_output, I only need to pass the entire tensor or perhaps just a specific part of it.

Thnak you.

arvyzukai · January 18, 2024, 6:27am

Hi @LevValenzuela

As far as I understand you’re doing everything correct.

It’s the entire enc_output (for which are specific parts you pass the padding_mask).

Ok… another point of failure could be if you forgot or got wrong the skip/residual connections (before normalization). For example, for the first layernorm, the input should be the sum of mult_attn_out1 and x, for the second - the mult_attn_out2 and Q1, and for the third - the sum of ffn_output and mult_attn_out2.

Is this the way you applied normalization?

LevValenzuela · January 18, 2024, 6:36am

Thank you, @arvyzukai. The issue occurred in layernorm2, specifically with the output from the first block.

Topic		Replies	Views
C4W2_Assignment DecoderLayer Issue Facing NLP with Attention Models week-2	13	450	December 19, 2023
C4 Week 2 C4W2_Assignment NLP with Attention Models week-2	3	235	April 26, 2024
C4W2_Assignment - Ex 7 Decoder Layer output NLP with Attention Models week-2	12	369	April 4, 2024
C4W2 Exercise 2 - sample test is correct but unit test cases are correct function is failing NLP with Attention Models week-2	5	36	October 20, 2024
C4W2 Assignment DecoderLayer NLP with Attention Models week-2	7	516	April 19, 2024

C4W2 Exercise 2 DecodeLayer. Output is correct, error in the UnitTest

Related topics