C5_W4_A1_Transformer_Subclass_v1 UNQ4

Hi,
I have got a problem when running the EncoderLayer class.
I have the assertion error " Wrong values when training=True"
I try a lot of thing from different similar topics but nothing worked.
Thanks for helping!

UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED FUNCTION EncoderLayer

class EncoderLayer(tf.keras.layers.Layer):
“”"
The encoder layer is composed by a multi-head self-attention mechanism,
followed by a simple, positionwise fully connected feed-forward network.
This archirecture includes a residual connection around each of the two
sub-layers, followed by layer normalization.
“”"
def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(EncoderLayer, self).init()

    self.mha = MultiHeadAttention(num_heads=num_heads,
                                  key_dim=embedding_dim)

    self.ffn = FullyConnected(embedding_dim=embedding_dim,
                              fully_connected_dim=fully_connected_dim)

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)
   

def call(self, x, training, mask):
    """
    Forward pass for the Encoder Layer
    
    Arguments:
        x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        training -- Boolean, set to true to activate
                    the training mode for dropout layers
        mask -- Boolean mask to ensure that the padding is not 
                treated as part of the input
    Returns:
        encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
    """
    # START CODE HERE
    # calculate self-attention using mha(~1 line)
    self_attn_output = self.mha(x,x,x,mask)  # Self attention (batch_size, input_seq_len, embedding_dim)
    
    # apply dropout layer to the self-attention output (~1 line)
    #self_attn_output = self.dropout_ffn(self_attn_output)
    
    # apply layer normalization on sum of the input and the attention output to get the  
    # output of the multi-head attention layer (~1 line)
    mult_attn_out = self.layernorm1(x + self_attn_output)  # (batch_size, input_seq_len, embedding_dim)

    # pass the output of the multi-head attention layer through a ffn (~1 line)
    ffn_output = self.ffn(mult_attn_out)  # (batch_size, input_seq_len, embedding_dim)
    
    # apply dropout layer to ffn output (~1 line)
    ffn_output = self.dropout_ffn(ffn_output, training = training)
    
    # apply layer normalization on sum of the output from multi-head attention and ffn output to get the
    # output of the encoder layer (~1 line)
    encoder_layer_out = self.layernorm2(mult_attn_out + ffn_output)  # (batch_size, input_seq_len, embedding_dim)
    # END CODE HERE
    
    return encoder_layer_out

self.mha is missing the dropout parameter.

Also, why do comments in your call function look different from these? Follow the instructions and you should be good to do.

    def call(self, x, training, mask):
        """
        Forward pass for the Encoder Layer
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
        """
        # START CODE HERE
        # calculate self-attention using mha(~1 line). Dropout will be applied during training
        attn_output = None # Self attention (batch_size, input_seq_len, fully_connected_dim)
        
        # apply layer normalization on sum of the input and the attention output to get the  
        # output of the multi-head attention layer (~1 line)
        out1 = None  # (batch_size, input_seq_len, fully_connected_dim)

        # pass the output of the multi-head attention layer through a ffn (~1 line)
        ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)
        
        # apply dropout layer to ffn output during training (~1 line)
        ffn_output =  None

@balaji.ambresh, I don’t think we use dropout in the self.mha() layer. The comment says it’s added later.
It’s added at the ffn_output layer.

@pierrickrichard:
I agree that your code template appears to be incorrect. Is it an old version of the notebook?
Where did you get it from?

@TMosh Please see the starter code. It contains self.mha with dropout parameter.

@balaji.ambresh, no I don’t think so.

@balaji.ambresh, not in the DLS course at least. That’s what thread this forum is on.

@TMosh Here’s the link to the notebook:
https://github.com/https-deeplearning-ai/deep-learning-specialization/blob/master/C5/W4/assignment/A1/C5_W4_A1_Transformer_Subclass_v1.ipynb

I have the notebook already.

@TMosh Do you not see self.mha with the dropout parameter inside the __init__ method?

@TMosh I worked a lot on that question so maybe I tried to copy that from another topic on the discourse or something like that
But I don’t have the dropout parameter, the comment says it is added after
The problem is nobody understood the problem with the code…

Maybe I could try with the last version of the notebook if you have it, but I fear to have the same problem

@pierrickrichard You might find the section on Refresh your notebook helpful in the link below:
https://www.coursera.support/s/article/360004995312-Solve-problems-with-Jupyter-Notebooks#-5

@balaji.ambresh
That’s the dropout rate - not the training argument. Perhaps that’s where I’m confused.
There is no reason to change the dropout rate when the mha() layer is used, it’s pre-defined in the constructor.

Also, note that this notebook was updated earlier today, and it’s still being revised so you can expect another update soon.

The revisions are updates to the instructions - not any functional changes.

That doesn’t address where the student’s old version of the notebook came from.

@TMosh That’s what I’m referring to as well.
If you look at the original post, self.mha doesn’t have the dropout parameter.

Those are the instructions that are being updated in the new notebook version.

@pierrickrichard , regarding your code:
It appears you have modified the constructor for self.mha(). You’re missing a parameter.
Or you’re using an obsolete copy of the notebook that doesn’t have that parameter.

The correct notebook has this for the constructor:
image

Other than the problem with the constructor, the code you implemented seems correct.

Yes, I understand what you’re referring to now.

Thanks a lot! Problem solved :slight_smile: