Good Morning Juan,
I will sent you the code for Encoder Layer and Encoder, I have the suspicion that perhaps the bug lies not in the Encoder alone.
The additional print commands did not really help me, but perhaps they gave you some hints.
I had some battles here in the Specialisation, but this is also new for me.
Thank you for your support,
Michael
ENCODER_LAYER:
def call(self, x, training, mask):
“”"
Forward pass for the Encoder Layer
Arguments:
x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
training -- Boolean, set to true to activate
the training mode for dropout layers
mask -- Boolean mask to ensure that the padding is not
treated as part of the input
Returns:
encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
"""
# START CODE HERE
# calculate self-attention using mha(~1 line).
# Dropout is added by Keras automatically if the dropout parameter is non-zero during training
attn_output = self.mha(x, x, training=training, attention_mask= mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
# apply layer normalization on sum of the input and the attention output to get the
# output of the multi-head attention layer (~1 line)
out1 = self.layernorm1(tf.keras.layers.add([x, attn_output])) # (batch_size, input_seq_len, fully_connected_dim)
# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = self.ffn(out1) # (batch_size, input_seq_len, fully_connected_dim)
# apply dropout layer to ffn output during training (~1 line)
ffn_output = self.dropout_ffn(ffn_output, training=training)
# apply layer normalization on sum of the output from multi-head attention and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = self.layernorm2(tf.keras.layers.add([out1, ffn_output])) # (batch_size, input_seq_len, embedding_dim)
# END CODE HERE
return encoder_layer_out
ENCODER with additional print-commands
def call(self, x, training, mask):
“”"
Forward pass for the Encoder
Arguments:
x -- Tensor of shape (batch_size, input_seq_len)
training -- Boolean, set to true to activate
the training mode for dropout layers
mask -- Boolean mask to ensure that the padding is not
treated as part of the input
Returns:
out2 -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
"""
#mask = create_padding_mask(x)
seq_len = tf.shape(x)[1]
# START CODE HERE
# Pass input through the Embedding layer
x = self.embedding(x) # (batch_size, input_seq_len, embedding_dim)
print (x)
# Scale embedding by multiplying it by the square root of the embedding dimension
x *= tf.math.sqrt(tf.cast(self.embedding_dim, tf.float32))
print (x)
# Add the position encoding to embedding
#x += self.pos_encoding[0, :x.shape[1], : ]
x += self.pos_encoding[:, :seq_len, :]
print ("After position encoding = ", x)
# Pass the encoded embedding through a dropout layer
x = self.dropout(x, training)
print (x)
print("num_layers = ", self.num_layers)
for i in range(self.num_layers):
print("i = ", i)
print("x = ", x)
print(self.enc_layers[i](x,training, mask))
# Pass the output through the stack of encoding layers
for i in range(self.num_layers):
x = self.enc_layers[i](x,training, mask)
# END CODE HERE
return x # (batch_size, input_seq_len, embedding_dim)