C5 W4 A1: Wrong values when training=True

Hi All:
Having problems with EncoderLayer_test(EncoderLayer).
Seems this issue is happening frequently with others. I tried all the suggestions but still can not get to work. Please see my implementation of def call(self, x, training, mask).
Can anyone please change / suggest fix.

Thank you,
E

def call(self, x, training, mask):
    """
    Forward pass for the Encoder Layer
    
    Arguments:
        x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
        training -- Boolean, set to true to activate
                    the training mode for dropout layers
        mask -- Boolean mask to ensure that the padding is not 
                treated as part of the input
    Returns:
        encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
    """
    # START CODE HERE
    # calculate self-attention using mha(~1 line). Dropout will be applied during training
    attn_output = self.mha(x, x, x, mask)# None # Self attention (batch_size, input_seq_len, fully_connected_dim)
    attn_output = self.dropout_ffn(attn_output, training=training)
    
    # apply layer normalization on sum of the input and the attention output to get the  
    # output of the multi-head attention layer (~1 line)
    out1 = self.layernorm1(x + attn_output) # None  # (batch_size, input_seq_len, fully_connected_dim)

    # pass the output of the multi-head attention layer through a ffn (~1 line)
    ffn_output = self.ffn(out1)# None  # (batch_size, input_seq_len, fully_connected_dim)
    
    # apply dropout layer to ffn output during training (~1 line)
    ffn_output = self.dropout_ffn(ffn_output, training=training)# None
    
    # apply layer normalization on sum of the output from multi-head attention and ffn output to get the
    # output of the encoder layer (~1 line)
    encoder_layer_out = self.layernorm2(out1 + ffn_output)# None  # (batch_size, input_seq_len, fully_connected_dim)
    # END CODE HERE
    
    return encoder_layer_out

Your second line of “attn_outout = self.dropout_ffn(…)” is not necessary. The instructions tell you that dropout will be applied later - you don’t add it at this step.

Hi TMosh:

Thank you, deeply appreciate.
You are correct.
My mistake.

Again, thank you,
E