C5W4: Transformer Network

Nikhil_Adyapak · May 30, 2023, 10:20am

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True

I have added
the parameters for self.mha as x,x,x and mask
training=training for both the drop out layers correctly

and yet I am getting this error

There is not enough documentation here and neither is the hints helping me out here and I am stuck for hours trying to debug this code cell.

Can anyone please help me with what needs to be done here asap?
This error has been really frustrating for me.

saifkhanengr · May 30, 2023, 10:32am

But we have only one dropout. Check the code again:

# START CODE HERE
# calculate self-attention using mha(~1 line).
# Dropout is added by Keras automatically if the dropout parameter is non-zero during training
self_mha_output = None  # Self attention (batch_size, input_seq_len, fully_connected_dim)
  
# skip connection
# apply layer normalization on sum of the input and the attention output to get the  
# output of the multi-head attention layer (~1 line)
skip_x_attention = None  # (batch_size, input_seq_len, fully_connected_dim)

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)
  
# apply dropout layer to ffn output during training (~1 line)
# use `training=training` 
ffn_output = None
  
# apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = None  # (batch_size, input_seq_len, embedding_dim)
# END CODE HERE

Nikhil_Adyapak · May 30, 2023, 10:57am

saifkhanengr:

# START CODE HERE
# calculate self-attention using mha(~1 line).
# Dropout is added by Keras automatically if the dropout parameter is non-zero during training
self_mha_output = None  # Self attention (batch_size, input_seq_len, fully_connected_dim)
  
# skip connection
# apply layer normalization on sum of the input and the attention output to get the  
# output of the multi-head attention layer (~1 line)
skip_x_attention = None  # (batch_size, input_seq_len, fully_connected_dim)

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)
  
# apply dropout layer to ffn output during training (~1 line)
# use `training=training` 
ffn_output = None
  
# apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = None  # (batch_size, input_seq_len, embedding_dim)
# END CODE HERE

Hey, can you send the entire code cell as in with the entire class and init function?

I am getting other errors now

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
86 encoded = encoder_layer1(q, True, np.array([[1, 0, 1]]))
87
—> 88 assert tf.is_tensor(encoded), “Wrong type. Output must be a tensor”
89 assert tuple(tf.shape(encoded).numpy()) == (1, q.shape[1], q.shape[2]), f"Wrong shape. We expected ((1, {q.shape[1]}, {q.shape[2]}))"
90

AssertionError: Wrong type. Output must be a tensor

saifkhanengr · May 30, 2023, 11:04am

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION EncoderLayer
class EncoderLayer(tf.keras.layers.Layer):
    """
    The encoder layer is composed by a multi-head self-attention mechanism,
    followed by a simple, positionwise fully connected feed-forward network. 
    This archirecture includes a residual connection around each of the two 
    sub-layers, followed by layer normalization.
    """
    def __init__(self, embedding_dim, num_heads, fully_connected_dim,
                 dropout_rate=0.1, layernorm_eps=1e-6):
        super(EncoderLayer, self).__init__()

        self.mha = MultiHeadAttention(num_heads=num_heads,
                                      key_dim=embedding_dim,
                                      dropout=dropout_rate)

        self.ffn = FullyConnected(embedding_dim=embedding_dim,
                                  fully_connected_dim=fully_connected_dim)

        self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
        self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)

        self.dropout_ffn = Dropout(dropout_rate)
    
    def call(self, x, training, mask):
        """
        Forward pass for the Encoder Layer
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            encoder_layer_out -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        """
        # START CODE HERE
        # calculate self-attention using mha(~1 line).
        # Dropout is added by Keras automatically if the dropout parameter is non-zero during training
        self_mha_output = None  # Self attention (batch_size, input_seq_len, fully_connected_dim)
        
        # skip connection
        # apply layer normalization on sum of the input and the attention output to get the  
        # output of the multi-head attention layer (~1 line)
        skip_x_attention = None  # (batch_size, input_seq_len, fully_connected_dim)

        # pass the output of the multi-head attention layer through a ffn (~1 line)
        ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)
        
        # apply dropout layer to ffn output during training (~1 line)
        # use `training=training` 
        ffn_output = None
        
        # apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
        # output of the encoder layer (~1 line)
        encoder_layer_out = None  # (batch_size, input_seq_len, embedding_dim)
        # END CODE HERE
        
        return encoder_layer_out

saifkhanengr · May 30, 2023, 11:06am

If you want to get a fresh copy of your assignment, read this.

Topic		Replies	Views
C5 W4 UNQ_C4 Wrong values when training=True Sequence Models coursera-platform	14	1683	June 6, 2023
C5W4A1: Error with EncoderLayer - AssertionError: Wrong values when training=True Sequence Models coursera-platform	4	1307	April 2, 2025
[Week 4] Exercise 4 Encoder Layer Sequence Models coursera-platform	2	590	December 17, 2021
C5W4A1:EncoderLayer AssertionError: Wrong values when training=True Sequence Models coursera-platform	11	1335	September 19, 2022
C5_W4_A1_Transformer_Subclass_v1 UNQ4 Sequence Models coursera-platform	19	945	March 10, 2022

C5W4: Transformer Network

Related topics