Hi,
I have got a problem when running the EncoderLayer class.
I have the assertion error " Wrong values when training=True"
I try a lot of thing from different similar topics but nothing worked.
Thanks for helping!
UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION EncoderLayer
class EncoderLayer(tf.keras.layers.Layer):
“”"
The encoder layer is composed by a multi-head self-attention mechanism,
followed by a simple, positionwise fully connected feed-forward network.
This archirecture includes a residual connection around each of the two
sub-layers, followed by layer normalization.
“”"
def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(EncoderLayer, self).init()
self.mha = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim)
self.ffn = FullyConnected(embedding_dim=embedding_dim,
fully_connected_dim=fully_connected_dim)
self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)
self.dropout_ffn = Dropout(dropout_rate)
def call(self, x, training, mask):
"""
Forward pass for the Encoder Layer
Arguments:
x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
training -- Boolean, set to true to activate
the training mode for dropout layers
mask -- Boolean mask to ensure that the padding is not
treated as part of the input
Returns:
encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
"""
# START CODE HERE
# calculate self-attention using mha(~1 line)
self_attn_output = self.mha(x,x,x,mask) # Self attention (batch_size, input_seq_len, embedding_dim)
# apply dropout layer to the self-attention output (~1 line)
#self_attn_output = self.dropout_ffn(self_attn_output)
# apply layer normalization on sum of the input and the attention output to get the
# output of the multi-head attention layer (~1 line)
mult_attn_out = self.layernorm1(x + self_attn_output) # (batch_size, input_seq_len, embedding_dim)
# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = self.ffn(mult_attn_out) # (batch_size, input_seq_len, embedding_dim)
# apply dropout layer to ffn output (~1 line)
ffn_output = self.dropout_ffn(ffn_output, training = training)
# apply layer normalization on sum of the output from multi-head attention and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = self.layernorm2(mult_attn_out + ffn_output) # (batch_size, input_seq_len, embedding_dim)
# END CODE HERE
return encoder_layer_out
Also, why do comments in your call function look different from these? Follow the instructions and you should be good to do.
def call(self, x, training, mask):
"""
Forward pass for the Encoder Layer
Arguments:
x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
training -- Boolean, set to true to activate
the training mode for dropout layers
mask -- Boolean mask to ensure that the padding is not
treated as part of the input
Returns:
encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
"""
# START CODE HERE
# calculate self-attention using mha(~1 line). Dropout will be applied during training
attn_output = None # Self attention (batch_size, input_seq_len, fully_connected_dim)
# apply layer normalization on sum of the input and the attention output to get the
# output of the multi-head attention layer (~1 line)
out1 = None # (batch_size, input_seq_len, fully_connected_dim)
# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = None # (batch_size, input_seq_len, fully_connected_dim)
# apply dropout layer to ffn output during training (~1 line)
ffn_output = None
@TMosh I worked a lot on that question so maybe I tried to copy that from another topic on the discourse or something like that
But I don’t have the dropout parameter, the comment says it is added after
The problem is nobody understood the problem with the code…
@balaji.ambresh
That’s the dropout rate - not the training argument. Perhaps that’s where I’m confused.
There is no reason to change the dropout rate when the mha() layer is used, it’s pre-defined in the constructor.
@pierrickrichard , regarding your code:
It appears you have modified the constructor for self.mha(). You’re missing a parameter.
Or you’re using an obsolete copy of the notebook that doesn’t have that parameter.
The correct notebook has this for the constructor:
Other than the problem with the constructor, the code you implemented seems correct.