Stuck in C5_W4_A1_Transformer_Subclass_v1 errors

souravmodi22 · February 19, 2022, 11:15am

UNQ_C3

AssertionError: Wrong masked weights

UNQ_C6

AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2

UNQ_C7

AssertionError: Wrong values in outd when training=True

UNQ_C8

AssertionError: Wrong values in translation

Kic · February 19, 2022, 4:19pm

Hi @souravmodi22 ,

You have posted another query about not getting 100% from the auto grader. Have you got these problems sorted before submitting your assignment for grading?

souravmodi22 · February 19, 2022, 5:04pm

Yes I updated the unit tests and now the assertion errors are not there and all tests are pass in Jupyter notebook. But when I submit the assignment, I still get 50/100.

Kic · February 19, 2022, 9:18pm

Hi @souravmodi22

Can you post a screen shot of the submission summary. As I said in answering your other post, the unit tests are not the full test. So there must be some problems in your code. In order to diagnose your problem, we need more information. There should be error messages to indicate where in the code that failed.

souravmodi22 · February 20, 2022, 4:46am

hi @Kic , Please find code snapshot below related to scale_dot_product_attention method( UNQ_C3) .

When I ran unit test for this code, then “mask = np.array([[[1, 1, 0, 1], [1, 1, 0, 1], [1, 1, 0, 1]]])” in the unit test gave error. (attached snap below for assertion error)

But when I changed the mask to “mask = np.array([0, 0, 1, 0])” during unit testing, then error went. Please investigate if issue is in code or unit test and advise?

Kic · February 20, 2022, 3:51pm

Hi @souravmodi22 ,

Your dk, the dimension of keys is incorrectly extracted. It should be taken form the row elements of the k matrix, that is the number of keys. You do it like this:
dk = np.shape(k)[0]

Also, your code is incorrect when calculating the scaled tensor with the mask on. Here is implementation instruction reminder:

Multiply (1.0 - mask) by -1e9 before applying the softmax.

We are supposed to use Keras to implement the code, and that is how I did mine. So I can not comment on the line where softmax() is called.

souravmodi22 · February 21, 2022, 5:01am

@Kic okay …let me try correcting the error in UNQ_C3 based on the above inputs. Thank You.

souravmodi22 · March 3, 2022, 9:20am

@Kic resolved issue in UNQ_C3. The issue was not in dk but had to " Multiply ( 1.0 - mask ) by -1e9 before applying the softmax". Thank You. Now the next error is in UNQ_C6(Decoder Layer). AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2
Please find code below and advise:

class DecoderLayer(tf.keras.layers.Layer):
def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(DecoderLayer, self).init()
self.mha1 = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim)
self.mha2 = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim)
self.ffn = FullyConnected(embedding_dim=embedding_dim,
fully_connected_dim=fully_connected_dim)

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm3 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)

def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
    # START CODE HERE
    # enc_output.shape == (batch_size, input_seq_len, embedding_dim)
    
    # BLOCK 1
    # calculate self-attention and return attention scores as attn_weights_block1 (~1 line)
    mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask, return_attention_scores=True)  
    
    # apply dropout layer on the attention output (~1 line)
    mult_attn_out1 = self.dropout_ffn(mult_attn_out1, training = training)
    
    # apply layer normalization to the sum of the attention output and the input (~1 line)
    Q1 = self.layernorm1(mult_attn_out1 + x)
    
    # BLOCK 2
    # calculate self-attention using the Q from the first block and K and V from the encoder output.
    # MultiHeadAttention's call takes input (Query, Value, Key, attention_mask, return_attention_scores, training)
    # Return attention scores as attn_weights_block2 (~1 line)
    mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)
    
    # apply dropout layer on the attention output (~1 line)
    mult_attn_out2 = self.dropout_ffn(mult_attn_out2, training=training)
    
    # apply layer normalization to the sum of the attention output and the output of the first block (~1 line)
    mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1)  # (batch_size, target_seq_len, embedding_dim)
    
    #BLOCK 3
    # pass the output of the second block through a ffn
    ffn_output = self.ffn(mult_attn_out2) # (batch_size, target_seq_len, embedding_dim)
    
    # apply a dropout layer to the ffn output
    ffn_output = self.dropout_ffn(ffn_output, training = training)
    
    # apply layer normalization to the sum of the ffn output and the output of the second block
    out3 = self.layernorm3(ffn_output + mult_attn_out2) # (batch_size, target_seq_len, embedding_dim)
    # END CODE HERE

    return out3, attn_weights_block1, attn_weights_block2

Kic · March 3, 2022, 4:12pm

Hi @souravmodi22 ,

Here are a few of my observations :

Call to MultiHeadAttention() without setting the dropout rate. MultiHeadAttention() has a default drop out rate of 0.0. As you can see, the dropout rate here is 0.1.
Adding dropout layer after calculating the self-attention only changed the mult_attn_out1. If you set the dropout rate at MultiHeadAttention(), then the dropout will be take care of during training, and both mult_attn_out1, attn_weights_block1 will have the correct values. The same applies to Block2.
To sum the attention output and the output of the previous layer, you need to use the np.add() function which adds two arrays element- wise.

Hope these few points would help.

souravmodi22 · March 4, 2022, 8:43am

@Kic yes the problem was with dropout only changing the mult_attn_out1. Thank You. I have got 100/100 now as grader output but in Transformer test(UNQ_C8) I am getting error in unit Test. Please advise.

class Transformer(tf.keras.Model):
“”"
Complete transformer with an Encoder and a Decoder
“”"
def init(self, num_layers, embedding_dim, num_heads, fully_connected_dim, input_vocab_size,
target_vocab_size, max_positional_encoding_input,
max_positional_encoding_target, dropout_rate=0.1, layernorm_eps=1e-6):
super(Transformer, self).init()

    self.encoder = Encoder(num_layers=num_layers,
                           embedding_dim=embedding_dim,
                           num_heads=num_heads,
                           fully_connected_dim=fully_connected_dim,
                           input_vocab_size=input_vocab_size,
                           maximum_position_encoding=max_positional_encoding_input,
                           dropout_rate=dropout_rate,
                           layernorm_eps=layernorm_eps)

    self.decoder = Decoder(num_layers=num_layers, 
                           embedding_dim=embedding_dim,
                           num_heads=num_heads,
                           fully_connected_dim=fully_connected_dim,
                           target_vocab_size=target_vocab_size, 
                           maximum_position_encoding=max_positional_encoding_target,
                           dropout_rate=dropout_rate,
                           layernorm_eps=layernorm_eps)

    self.final_layer = Dense(target_vocab_size, activation='softmax')

def call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
    # START CODE HERE
    # call self.encoder with the appropriate arguments to get the encoder output
    enc_output = self.encoder(input_sentence,training,enc_padding_mask)  # (batch_size, inp_seq_len, fully_connected_dim)
    
    # call self.decoder with the appropriate arguments to get the decoder output
    # dec_output.shape == (batch_size, tar_seq_len, fully_connected_dim)
    dec_output, attention_weights = self.decoder(output_sentence, enc_output, training, look_ahead_mask, dec_padding_mask)
    
    # pass decoder output through a linear layer and softmax (~2 lines)
    final_output = self.final_layer(dec_output) # (batch_size, tar_seq_len, target_vocab_size)
    # END CODE HERE

    return final_output, attention_weights

Transformer_test(Transformer,create_look_ahead_mask,create_padding_mask)

AssertionError Traceback (most recent call last)
in
86
87 # print("\033[92mAll tests passed")
—> 88 Transformer_test(Transformer,create_look_ahead_mask,create_padding_mask)

in Transformer_test(target, create_look_ahead_mask, create_padding_mask)
53 assert np.allclose(translation[0, 0, 0:8],
54 [0.01660176,0.01909315,0.02999433,0.01405528,0.01979068,0.02224632,
—> 55 0.01541351,0.03147632]), “Wrong values in translation”
56
57 keys = list(weights.keys())

AssertionError: Wrong values in translation

Kic · March 4, 2022, 3:02pm

Hi @souravmodi22

The transformer class looks fine. You may like to have a look at the encoder code to see if there is something not quite right there.

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_v1 : UNQ_C8 Sequence Models	8	754	November 5, 2021
C5_W4_A1_Transformer_Subclass_v1 problem Sequence Models	3	778	August 19, 2021
C5_W4_A1_Transformer_Subclass_v1 W4 UNQ C3 Sequence Models	2	615	October 17, 2021
C5W4A1 scaled_dot_product_attention error Sequence Models	2	881	May 23, 2021
C5 W4 A1: AssertionError: Wrong masked weights Sequence Models	6	1021	January 29, 2022

Stuck in C5_W4_A1_Transformer_Subclass_v1 errors

UNQ_C3

UNQ_C6

UNQ_C7

UNQ_C8

Related topics