C5_W4_A1_Transformer_Subclass_v1 UNQ4

pierrickrichard · March 3, 2022, 9:42am

Hi,
I have got a problem when running the EncoderLayer class.
I have the assertion error " Wrong values when training=True"
I try a lot of thing from different similar topics but nothing worked.
Thanks for helping!

UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED FUNCTION EncoderLayer

class EncoderLayer(tf.keras.layers.Layer):
“”"
The encoder layer is composed by a multi-head self-attention mechanism,
followed by a simple, positionwise fully connected feed-forward network.
This archirecture includes a residual connection around each of the two
sub-layers, followed by layer normalization.
“”"
def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(EncoderLayer, self).init()

    self.mha = MultiHeadAttention(num_heads=num_heads,
                                  key_dim=embedding_dim)

    self.ffn = FullyConnected(embedding_dim=embedding_dim,
                              fully_connected_dim=fully_connected_dim)

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)
   

def call(self, x, training, mask):
    """
    Forward pass for the Encoder Layer
    
    Arguments:
        x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        training -- Boolean, set to true to activate
                    the training mode for dropout layers
        mask -- Boolean mask to ensure that the padding is not 
                treated as part of the input
    Returns:
        encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
    """
    # START CODE HERE
    # calculate self-attention using mha(~1 line)
    self_attn_output = self.mha(x,x,x,mask)  # Self attention (batch_size, input_seq_len, embedding_dim)
    
    # apply dropout layer to the self-attention output (~1 line)
    #self_attn_output = self.dropout_ffn(self_attn_output)
    
    # apply layer normalization on sum of the input and the attention output to get the  
    # output of the multi-head attention layer (~1 line)
    mult_attn_out = self.layernorm1(x + self_attn_output)  # (batch_size, input_seq_len, embedding_dim)

    # pass the output of the multi-head attention layer through a ffn (~1 line)
    ffn_output = self.ffn(mult_attn_out)  # (batch_size, input_seq_len, embedding_dim)
    
    # apply dropout layer to ffn output (~1 line)
    ffn_output = self.dropout_ffn(ffn_output, training = training)
    
    # apply layer normalization on sum of the output from multi-head attention and ffn output to get the
    # output of the encoder layer (~1 line)
    encoder_layer_out = self.layernorm2(mult_attn_out + ffn_output)  # (batch_size, input_seq_len, embedding_dim)
    # END CODE HERE
    
    return encoder_layer_out

balaji.ambresh · March 3, 2022, 7:00pm

self.mha is missing the dropout parameter.

Also, why do comments in your call function look different from these? Follow the instructions and you should be good to do.

    def call(self, x, training, mask):
        """
        Forward pass for the Encoder Layer
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
        """
        # START CODE HERE
        # calculate self-attention using mha(~1 line). Dropout will be applied during training
        attn_output = None # Self attention (batch_size, input_seq_len, fully_connected_dim)
        
        # apply layer normalization on sum of the input and the attention output to get the  
        # output of the multi-head attention layer (~1 line)
        out1 = None  # (batch_size, input_seq_len, fully_connected_dim)

        # pass the output of the multi-head attention layer through a ffn (~1 line)
        ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)
        
        # apply dropout layer to ffn output during training (~1 line)
        ffn_output =  None

TMosh · March 3, 2022, 7:52pm

@balaji.ambresh, I don’t think we use dropout in the self.mha() layer. The comment says it’s added later.
It’s added at the ffn_output layer.

TMosh · March 3, 2022, 8:11pm

@pierrickrichard:
I agree that your code template appears to be incorrect. Is it an old version of the notebook?
Where did you get it from?

balaji.ambresh · March 4, 2022, 6:05am

@TMosh Please see the starter code. It contains self.mha with dropout parameter.

TMosh · March 4, 2022, 7:48am

@balaji.ambresh, no I don’t think so.

TMosh · March 4, 2022, 7:49am

@balaji.ambresh, not in the DLS course at least. That’s what thread this forum is on.

balaji.ambresh · March 4, 2022, 8:49am

@TMosh Here’s the link to the notebook:
https://github.com/https-deeplearning-ai/deep-learning-specialization/blob/master/C5/W4/assignment/A1/C5_W4_A1_Transformer_Subclass_v1.ipynb

TMosh · March 4, 2022, 9:01am

I have the notebook already.

balaji.ambresh · March 4, 2022, 9:05am

@TMosh Do you not see self.mha with the dropout parameter inside the __init__ method?

pierrickrichard · March 4, 2022, 9:17am

@TMosh I worked a lot on that question so maybe I tried to copy that from another topic on the discourse or something like that
But I don’t have the dropout parameter, the comment says it is added after
The problem is nobody understood the problem with the code…

pierrickrichard · March 4, 2022, 9:18am

Maybe I could try with the last version of the notebook if you have it, but I fear to have the same problem

balaji.ambresh · March 4, 2022, 9:23am

@pierrickrichard You might find the section on Refresh your notebook helpful in the link below:
https://www.coursera.support/s/article/360004995312-Solve-problems-with-Jupyter-Notebooks#-5

TMosh · March 4, 2022, 7:26pm

@balaji.ambresh
That’s the dropout rate - not the training argument. Perhaps that’s where I’m confused.
There is no reason to change the dropout rate when the mha() layer is used, it’s pre-defined in the constructor.

TMosh · March 4, 2022, 7:30pm

Also, note that this notebook was updated earlier today, and it’s still being revised so you can expect another update soon.

The revisions are updates to the instructions - not any functional changes.

That doesn’t address where the student’s old version of the notebook came from.

balaji.ambresh · March 4, 2022, 7:31pm

@TMosh That’s what I’m referring to as well.
If you look at the original post, self.mha doesn’t have the dropout parameter.

TMosh · March 4, 2022, 7:31pm

Those are the instructions that are being updated in the new notebook version.

TMosh · March 4, 2022, 7:37pm

@pierrickrichard , regarding your code:
It appears you have modified the constructor for self.mha(). You’re missing a parameter.
Or you’re using an obsolete copy of the notebook that doesn’t have that parameter.

The correct notebook has this for the constructor:

Other than the problem with the constructor, the code you implemented seems correct.

TMosh · March 4, 2022, 7:42pm

Yes, I understand what you’re referring to now.

pierrickrichard · March 10, 2022, 3:29pm

Thanks a lot! Problem solved

Topic		Replies	Views
C5W4: Transformer Network Sequence Models coursera-platform	4	975	May 30, 2023
C5 W4 A1: Wrong values when training=True Sequence Models coursera-platform	2	664	February 4, 2022
Course_5_Encoder layer Sequence Models coursera-platform	2	792	August 25, 2021
C5 W4 A1 Encoder Layer isn't working Sequence Models coursera-platform	1	955	July 30, 2021
C5 W4 A1: Encode layer dropout error Sequence Models coursera-platform	2	663	January 30, 2022

C5_W4_A1_Transformer_Subclass_v1 UNQ4

UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED FUNCTION EncoderLayer

Related topics