UNQ_C3
AssertionError: Wrong masked weights
UNQ_C6
AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2
UNQ_C7
AssertionError: Wrong values in outd when training=True
UNQ_C8
AssertionError: Wrong values in translation
AssertionError: Wrong masked weights
AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2
AssertionError: Wrong values in outd when training=True
AssertionError: Wrong values in translation
Hi @souravmodi22 ,
You have posted another query about not getting 100% from the auto grader. Have you got these problems sorted before submitting your assignment for grading?
Yes I updated the unit tests and now the assertion errors are not there and all tests are pass in Jupyter notebook. But when I submit the assignment, I still get 50/100.
Can you post a screen shot of the submission summary. As I said in answering your other post, the unit tests are not the full test. So there must be some problems in your code. In order to diagnose your problem, we need more information. There should be error messages to indicate where in the code that failed.
hi @Kic , Please find code snapshot below related to scale_dot_product_attention method( UNQ_C3) .
When I ran unit test for this code, then “mask = np.array([[[1, 1, 0, 1], [1, 1, 0, 1], [1, 1, 0, 1]]])” in the unit test gave error. (attached snap below for assertion error)
But when I changed the mask to “mask = np.array([0, 0, 1, 0])” during unit testing, then error went. Please investigate if issue is in code or unit test and advise?
Hi @souravmodi22 ,
Your dk, the dimension of keys is incorrectly extracted. It should be taken form the row elements of the k matrix, that is the number of keys. You do it like this:
dk = np.shape(k)[0]
Also, your code is incorrect when calculating the scaled tensor with the mask on. Here is implementation instruction reminder:
Multiply (1.0 - mask) by -1e9 before applying the softmax.
We are supposed to use Keras to implement the code, and that is how I did mine. So I can not comment on the line where softmax() is called.
@Kic okay …let me try correcting the error in UNQ_C3 based on the above inputs. Thank You.
@Kic resolved issue in UNQ_C3. The issue was not in dk but had to " Multiply ( 1.0 - mask ) by -1e9 before applying the softmax". Thank You. Now the next error is in UNQ_C6(Decoder Layer). AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2
Please find code below and advise:
class DecoderLayer(tf.keras.layers.Layer):
def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(DecoderLayer, self).init()
self.mha1 = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim)
self.mha2 = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim)
self.ffn = FullyConnected(embedding_dim=embedding_dim,
fully_connected_dim=fully_connected_dim)
self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)
self.layernorm3 = LayerNormalization(epsilon=layernorm_eps)
self.dropout_ffn = Dropout(dropout_rate)
def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
# START CODE HERE
# enc_output.shape == (batch_size, input_seq_len, embedding_dim)
# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1 (~1 line)
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask, return_attention_scores=True)
# apply dropout layer on the attention output (~1 line)
mult_attn_out1 = self.dropout_ffn(mult_attn_out1, training = training)
# apply layer normalization to the sum of the attention output and the input (~1 line)
Q1 = self.layernorm1(mult_attn_out1 + x)
# BLOCK 2
# calculate self-attention using the Q from the first block and K and V from the encoder output.
# MultiHeadAttention's call takes input (Query, Value, Key, attention_mask, return_attention_scores, training)
# Return attention scores as attn_weights_block2 (~1 line)
mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)
# apply dropout layer on the attention output (~1 line)
mult_attn_out2 = self.dropout_ffn(mult_attn_out2, training=training)
# apply layer normalization to the sum of the attention output and the output of the first block (~1 line)
mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1) # (batch_size, target_seq_len, embedding_dim)
#BLOCK 3
# pass the output of the second block through a ffn
ffn_output = self.ffn(mult_attn_out2) # (batch_size, target_seq_len, embedding_dim)
# apply a dropout layer to the ffn output
ffn_output = self.dropout_ffn(ffn_output, training = training)
# apply layer normalization to the sum of the ffn output and the output of the second block
out3 = self.layernorm3(ffn_output + mult_attn_out2) # (batch_size, target_seq_len, embedding_dim)
# END CODE HERE
return out3, attn_weights_block1, attn_weights_block2
Hi @souravmodi22 ,
Here are a few of my observations :
Call to MultiHeadAttention() without setting the dropout rate. MultiHeadAttention() has a default drop out rate of 0.0. As you can see, the dropout rate here is 0.1.
Adding dropout layer after calculating the self-attention only changed the mult_attn_out1. If you set the dropout rate at MultiHeadAttention(), then the dropout will be take care of during training, and both mult_attn_out1, attn_weights_block1 will have the correct values. The same applies to Block2.
To sum the attention output and the output of the previous layer, you need to use the np.add() function which adds two arrays element- wise.
Hope these few points would help.
@Kic yes the problem was with dropout only changing the mult_attn_out1. Thank You. I have got 100/100 now as grader output but in Transformer test(UNQ_C8) I am getting error in unit Test. Please advise.
class Transformer(tf.keras.Model):
“”"
Complete transformer with an Encoder and a Decoder
“”"
def init(self, num_layers, embedding_dim, num_heads, fully_connected_dim, input_vocab_size,
target_vocab_size, max_positional_encoding_input,
max_positional_encoding_target, dropout_rate=0.1, layernorm_eps=1e-6):
super(Transformer, self).init()
self.encoder = Encoder(num_layers=num_layers,
embedding_dim=embedding_dim,
num_heads=num_heads,
fully_connected_dim=fully_connected_dim,
input_vocab_size=input_vocab_size,
maximum_position_encoding=max_positional_encoding_input,
dropout_rate=dropout_rate,
layernorm_eps=layernorm_eps)
self.decoder = Decoder(num_layers=num_layers,
embedding_dim=embedding_dim,
num_heads=num_heads,
fully_connected_dim=fully_connected_dim,
target_vocab_size=target_vocab_size,
maximum_position_encoding=max_positional_encoding_target,
dropout_rate=dropout_rate,
layernorm_eps=layernorm_eps)
self.final_layer = Dense(target_vocab_size, activation='softmax')
def call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
# START CODE HERE
# call self.encoder with the appropriate arguments to get the encoder output
enc_output = self.encoder(input_sentence,training,enc_padding_mask) # (batch_size, inp_seq_len, fully_connected_dim)
# call self.decoder with the appropriate arguments to get the decoder output
# dec_output.shape == (batch_size, tar_seq_len, fully_connected_dim)
dec_output, attention_weights = self.decoder(output_sentence, enc_output, training, look_ahead_mask, dec_padding_mask)
# pass decoder output through a linear layer and softmax (~2 lines)
final_output = self.final_layer(dec_output) # (batch_size, tar_seq_len, target_vocab_size)
# END CODE HERE
return final_output, attention_weights
Transformer_test(Transformer,create_look_ahead_mask,create_padding_mask)
AssertionError Traceback (most recent call last)
in
86
87 # print("\033[92mAll tests passed")
—> 88 Transformer_test(Transformer,create_look_ahead_mask,create_padding_mask)
in Transformer_test(target, create_look_ahead_mask, create_padding_mask)
53 assert np.allclose(translation[0, 0, 0:8],
54 [0.01660176,0.01909315,0.02999433,0.01405528,0.01979068,0.02224632,
—> 55 0.01541351,0.03147632]), “Wrong values in translation”
56
57 keys = list(weights.keys())
AssertionError: Wrong values in translation
The transformer class looks fine. You may like to have a look at the encoder code to see if there is something not quite right there.