Ex4 fails on my local PC environment, while same code passes on Jupyter cloter

Hello. I am working on W4A1 assignment. For my convinience I installed everything on my local PC in order to use VS code.
When I run the automatic check for ex4 () EncoderLayer_test(EncoderLayer)
I get the following error:

AssertionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_37688\518892524.py in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

c:\gilad\my courses\coursera\Reccurent Neural Networks\W4A1\public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True
Though, when I run the same code in the cloud’s Jupyter notebook the test passes.
I verified python versions:3.7.6 on both environments.

The reason might be that your version of libraries generates different random numbers than the numbers generated by the versions used in assignments. Installing all the libraries is not enough, you need the same versions as well. Of course, this is not a simple task as each assignment might use a different version.

There are multiple guides on managing the local environment on this forum like this one and two.

1 Like

OK Thx

AssertionError when training=True in VS Code but not Jupyter?

  • Double-check inputs: Verify that the input shapes, values, and data types are EXACTLY the same in both environments.
  • Randomness: Set a fixed random seed before calling your EncoderLayer in both places to ensure consistent results.
  • Dependencies: Check if the versions of your libraries (PyTorch, NumPy, etc.) match in both environments. Use virtual environments to isolate dependencies.
  • Debugging: Print intermediate values or manually calculate the expected output to pinpoint the discrepancy.

OK. So now I continued to ex 5. I am running on the cloud Jupyter notebook, and I get teh following error:
AssertionError Traceback (most recent call last)
in
2 # gilad
3 if (not VS):
----> 4 Encoder_test(Encoder)

~/work/W4A1/public_tests.py in Encoder_test(target)
133 [[ 0.01838917, 1.038109 , -1.6154225 , 0.55892444],
134 [ 0.3872563 , -0.40960154, -1.3456631 , 1.3680083 ],
→ 135 [ 0.534565 , -0.70262754, -1.18215 , 1.3502126 ]]]), “Wrong values case 2”
136
137 encoderq_output = encoderq(x, False, np.array([[[[1., 1., 1.]]], [[[1., 1., 0.]]]]))

AssertionError: Wrong values case 2

Please use your debugging skills to play with it. There is no official guide to replicate the assignments in a Local or Cloud Environment. If you found some interesting solutions, please share with us.

1 Like

Right! If you are running in the real course website and you fail the tests, then the excuse about versions of the packages does not apply and it means your code is not correct. As Saif says, it’s time for some debugging. But the first step might be to take a few deep calming breaths or go for a walk and then come back and start by reading the instructions for that section carefully with “fresh eyes” and then compare that to your code.

Thanks for the calming tip…
After doing the walk and breathing you suggested, I noticed that the skeleton code, nor the instructions guide using the normalization layers, unlike the status in ex. 6.
Thus, my questions are:

  1. Is that on purpose, and if so, why?
  2. If not, could it be the reason for the problem I encounter?
    Thx for the support, Gilad

I don’t understand your question. What do you mean by saying “on purpose”? Which Exercise are you referring to? And, are you facing any errors in Coursera Environment? If so, then the blame is on your code. Please share your error in that case.

Sorry, I am away from my computer and on a bus using only my phone, so I won’t be able to look at that notebook for 8 or 10 hours to remember the details of that function. But the instructions are generally pretty thorough here. Are you saying that you included the normalization even though they don’t explicitly call for it? If so, then it would be worth just trying it without the normalization and see if you then pass the tests.

I mean, should I add the normalization layers although this is not hinted in your skeleton code? And, yes I, I am using the Coursera Environment.
My code:
class EncoderLayer(tf.keras.layers.Layer):
“”"
The encoder layer is composed by a multi-head self-attention mechanism,
followed by a simple, positionwise fully connected feed-forward network.
This architecture includes a residual connection around each of the two
sub-layers, followed by layer normalization.
“”"
def init(self, embedding_dim, num_heads, fully_connected_dim,
dropout_rate=0.1, layernorm_eps=1e-6):
super(EncoderLayer, self).init()

    self.mha = MultiHeadAttention(num_heads=num_heads,
                                  key_dim=embedding_dim,
                                  dropout=dropout_rate)

    self.ffn = FullyConnected(embedding_dim=embedding_dim,
                              fully_connected_dim=fully_connected_dim)

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)

def call(self, x, training, mask=None):
    """
    Forward pass for the Encoder Layer
    
    Arguments:
        x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        training -- Boolean, set to true to activate
                    the training mode for dropout layers
        mask -- Boolean mask to ensure that the padding is not 
                treated as part of the input
    Returns:
        encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
    """
    # START CODE HERE
    # calculate self-attention using mha(~1 line).
    # Dropout is added by Keras automatically if the dropout parameter is non-zero during training
    self_mha_output = self.mha(x, x, x, mask)   # Self attention (batch_size, input_seq_len, embedding_dim)
    
    # skip connection
    # apply layer normalization on sum of the input and the attention output to get the  
    # output of the multi-head attention layer (~1 line)
    skip_x_attention = self.layernorm1(x+self_mha_output)  # (batch_size, input_seq_len, embedding_dim)

    # pass the output of the multi-head attention layer through a ffn (~1 line)
    ffn_output = self.ffn(skip_x_attention)  # (batch_size, input_seq_len, embedding_dim)
    
    # apply dropout layer to ffn output during training (~1 line)
    # use `training=training` 
    ffn_output = self.dropout_ffn(ffn_output,training=training)
    
    # apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
    # output of the encoder layer (~1 line)
    encoder_layer_out = self.layernorm1(ffn_output+skip_x_attention)  # (batch_size, input_seq_len, embedding_dim)
    # END CODE HERE
    
    return encoder_layer_out

Alright, I got your point. Please use the 2nd normalization layer for encoder_layer_out.

I don’t see they did this on purpose. We learn as we go on the journey. I will open the Git issue to add the instruction for this step.

I think, yes, that is the reason for the problem you encountered. Does using 2nd normalization layer resolve your problem?

No It does not. Here is my corrected code:

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION EncoderLayer
class EncoderLayer(tf.keras.layers.Layer):
    """
    The encoder layer is composed by a multi-head self-attention mechanism,
    followed by a simple, positionwise fully connected feed-forward network. 
    This architecture includes a residual connection around each of the two 
    sub-layers, followed by layer normalization.
    """
    def __init__(self, embedding_dim, num_heads, fully_connected_dim,
                 dropout_rate=0.1, layernorm_eps=1e-6):
        super(EncoderLayer, self).__init__()

        self.mha = MultiHeadAttention(num_heads=num_heads,
                                      key_dim=embedding_dim,
                                      dropout=dropout_rate)

        self.ffn = FullyConnected(embedding_dim=embedding_dim,
                                  fully_connected_dim=fully_connected_dim)

        self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
        self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)

        self.dropout_ffn = Dropout(dropout_rate)
    
    def call(self, x, training, mask=None):
        """
        Forward pass for the Encoder Layer
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        """
        # START CODE HERE
        # calculate self-attention using mha(~1 line).
        # Dropout is added by Keras automatically if the dropout parameter is non-zero during training
        self_mha_output = self.mha(x, x, x, mask)   # Self attention (batch_size, input_seq_len, embedding_dim)
        
        # skip connection
        # apply layer normalization on sum of the input and the attention output to get the  
        # output of the multi-head attention layer (~1 line)
        skip_x_attention = self.layernorm1(x+self_mha_output)  # (batch_size, input_seq_len, embedding_dim)

        # pass the output of the multi-head attention layer through a ffn (~1 line)
        ffn_output = self.ffn(skip_x_attention)  # (batch_size, input_seq_len, embedding_dim)
        
        # apply dropout layer to ffn output during training (~1 line)
        # use `training=training` 
        ffn_output = self.dropout_ffn(ffn_output,training=training)
        
        # apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
        # output of the encoder layer (~1 line)
        encoder_layer_out = self.layernorm2(ffn_output+skip_x_attention)  # (batch_size, input_seq_len, embedding_dim)
        # END CODE HERE
        
        return encoder_layer_out
    
# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION
class Encoder(tf.keras.layers.Layer):
    """
    The entire Encoder starts by passing the input to an embedding layer 
    and using positional encoding to then pass the output through a stack of
    encoder Layers
        
    """  
    def __init__(self, num_layers, embedding_dim, num_heads, fully_connected_dim, input_vocab_size,
               maximum_position_encoding, dropout_rate=0.1, layernorm_eps=1e-6):
        super(Encoder, self).__init__()

        self.embedding_dim = embedding_dim
        self.num_layers = num_layers

        self.embedding = Embedding(input_vocab_size, self.embedding_dim)
        self.pos_encoding = positional_encoding(maximum_position_encoding, 
                                                self.embedding_dim)


        self.enc_layers = [EncoderLayer(embedding_dim=self.embedding_dim,
                                        num_heads=num_heads,
                                        fully_connected_dim=fully_connected_dim,
                                        dropout_rate=dropout_rate,
                                        layernorm_eps=layernorm_eps) 
                           for _ in range(self.num_layers)]

        self.dropout = Dropout(dropout_rate)
        
    def call(self, x, training, mask):
        """
        Forward pass for the Encoder
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            x -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        """
        seq_len = tf.shape(x)[1]
        # START CODE HERE
        # Pass input through the Embedding layer
        x = self.embedding(x)  # (batch_size, input_seq_len, embedding_dim)
        # Scale embedding by multiplying it by the square root of the embedding dimension
        x *= tf.sqrt(tf.cast(x.shape[-1],dtype=tf.float32))
        # Add the position encoding to embedding
        x += self.pos_encoding[:, :seq_len, :]
        # Pass the encoded embedding through a dropout layer
        # use `training=training`
        x = self.dropout(x,training=training)
        # Pass the output through the stack of encoding layers 
        for i in range(self.num_layers):
            x = self.enc_layers[i](x)
        # END CODE HERE

        return x  # (batch_size, input_seq_len, embedding_dim)

And I get:


AssertionError Traceback (most recent call last)
in
2 # gilad
3 if (not VS):
----> 4 Encoder_test(Encoder)

~/work/W4A1/public_tests.py in Encoder_test(target)
133 [[ 0.01838917, 1.038109 , -1.6154225 , 0.55892444],
134 [ 0.3872563 , -0.40960154, -1.3456631 , 1.3680083 ],
→ 135 [ 0.534565 , -0.70262754, -1.18215 , 1.3502126 ]]]), “Wrong values case 2”
136
137 encoderq_output = encoderq(x, False, np.array([[[[1., 1., 1.]]], [[[1., 1., 0.]]]]))

AssertionError: Wrong values case 2

Hi @gilad.danini ,

I could not see the mask is being used when looping through the stack of encoding layers. Could that be the problem?

Oh, all the time I thought you were facing an issue with Ex 4 (as your title says). But it’s Ex 5, not 4, which troubles you.
Indeed, your code of Ex 5 is incorrect.

This is incorrect. Please read the instructions again. They said " embedding dimension."

  1. Scale your embedding by multiplying it by the square root of your embedding dimension. Remember to cast the embedding dimension to data type tf.float32 before computing the square root.

This is also incorrect, as highlighted by Kin. Need training and mask.

PS: Providing incomplete or incorrect information, as in this case, your title suggests the issue is related to the Local PC environment and Ex 4, makes it difficult to offer the appropriate assistance. Accurate details help us assist you more effectively.

Hi guys.
It works!
Thanks for the help and applogizes for my mistake in the title.
Gilad

1 Like