C5_W4_A1 UNQ_C4 Encoder Layer Mask

whitehat · August 2, 2021, 12:01am

It would be helpful for adding an addition hints for the “mask”. It took me a while to notice that the mask would need to be passed in self.mha since it won’t give error if not.

TMosh · August 2, 2021, 3:13am

Thanks for the suggestion. I’ll submit an issue for that.

rameshgopalan · August 2, 2021, 7:13pm

HELP! - i have followed your syntax suggestions, also carefully reviewed the Transformer tutorial but even following format there does not help the error at below:
is it how mask is passed as previous post says? but I am following both your recommendation and the tutorial below: but to no avail !!! - 3 days i have been trying various combinations/permutations - this course effort shouldnt be so much about some subtlety of syntax !!

  # START CODE HERE
        # calculate self-attention using mha(~1 line). Dropout will be applied during training
        attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
        # apply layer normalization on sum of the input and the attention output to get the  
        # output of the multi-head attention layer (~1 line)
        out1 = self.layernorm1(x + attn_output)  # (batch_size, input_seq_len, fully_connected_dim)

        # pass the output of the multi-head attention layer through a ffn (~1 line)
        ffn_output = self.ffn(out1)  # (batch_size, input_seq_len, fully_connected_dim)
        
        # apply dropout layer to ffn output during training (~1 line)
        ffn_output = self.dropout_ffn(ffn_output,training=training)
        
        # apply layer normalization on sum of the output from multi-head attention and ffn output to get the
        # output of the encoder layer (~1 line)
        encoder_layer_out = self.layernorm2(out1 + ffn_output)  # (batch_size, input_seq_len, fully_connected_dim)
        # END CODE HERE
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-00617004b1af> in <module>
      1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
     84     encoder_layer1 = target(4, 2, 8)
     85     tf.random.set_seed(10)
---> 86     encoded = encoder_layer1(q, True, np.array([[1, 0, 1]]))
     87 
     88     assert tf.is_tensor(encoded), "Wrong type. Output must be a tensor"

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1010         with autocast_variable.enable_auto_cast_variables(
   1011             self._compute_dtype_object):
-> 1012           outputs = call_fn(inputs, *args, **kwargs)
   1013 
   1014         if self._activity_regularizer:

<ipython-input-15-77db2b2e670d> in call(self, x, training, mask)
     39         # START CODE HERE
     40         # calculate self-attention using mha(~1 line). Dropout will be applied during training
---> 41         attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
     42         # apply layer normalization on sum of the input and the attention output to get the
     43         # output of the multi-head attention layer (~1 line)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1010         with autocast_variable.enable_auto_cast_variables(
   1011             self._compute_dtype_object):
-> 1012           outputs = call_fn(inputs, *args, **kwargs)
   1013 
   1014         if self._activity_regularizer:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in call(self, query, value, key, attention_mask, return_attention_scores, training)
    463     #   H = `size_per_head`
    464     # `query` = [B, T, N ,H]
--> 465     query = self._query_dense(query)
    466 
    467     # `key` = [B, S, N, H]

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1006       with ops.name_scope_v2(name_scope):
   1007         if not self.built:
-> 1008           self._maybe_build(inputs)
   1009 
   1010         with autocast_variable.enable_auto_cast_variables(

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2708         # operations.
   2709         with tf_utils.maybe_init_scope(self):
-> 2710           self.build(input_shapes)  # pylint:disable=not-callable
   2711       # We must set also ensure that the layer is marked as built, and the build
   2712       # shape is stored since user defined build functions may not be calling

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/einsum_dense.py in build(self, input_shape)
    152         constraint=self.kernel_constraint,
    153         dtype=self.dtype,
--> 154         trainable=True)
    155 
    156     if bias_shape is not None:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, use_resource, synchronization, aggregation, **kwargs)
    637         synchronization=synchronization,
    638         aggregation=aggregation,
--> 639         caching_device=caching_device)
    640     if regularizer is not None:
    641       # TODO(fchollet): in the future, this should be handled at the

/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    808         dtype=dtype,
    809         initializer=initializer,
--> 810         **kwargs_for_getter)
    811 
    812     # If we set an initializer and the variable processed it, tracking will not

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    127   # TODO(apassos,rohanj) figure out how to remove collections from here so we
    128   # can remove the V1.
--> 129   variable_shape = tensor_shape.TensorShape(shape)
    130   return tf_variables.VariableV1(
    131       initial_value=init_val,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, dims)
    756     """
    757     if isinstance(dims, (tuple, list)):  # Most common case.
--> 758       self._dims = [Dimension(d) for d in dims]
    759     elif dims is None:
    760       self._dims = None

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in <listcomp>(.0)
    756     """
    757     if isinstance(dims, (tuple, list)):  # Most common case.
--> 758       self._dims = [Dimension(d) for d in dims]
    759     elif dims is None:
    760       self._dims = None

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, value)
    204             TypeError("Dimension value must be integer or None or have "
    205                       "an __index__ method, got value '{0!r}' with type '{1!r}'"
--> 206                       .format(value, type(value))), None)
    207       if self._value < 0:
    208         raise ValueError("Dimension %d must be >= 0" % self._value)

/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

TypeError: Dimension value must be integer or None or have an __index__ method, got value '<__main__.EncoderLayer object at 0x7f17205d0a90>' with type '<class '__main__.EncoderLayer'>'

TMosh · August 2, 2021, 9:03pm

@rameshgopalan
I do not see any problems with the code you posted.

rameshgopalan · August 2, 2021, 10:52pm

Tom Mosher / Dear Mentor:

I PLEASE HELP review the code, and if no Errors that human can find, then override the autograder.

In particular, I have closely followed the Transformer application posted open source, and completed ALL exercises through Exercise 8, but because of this autograder issue with Exercise 4, it gives me NO CREDIT for ANY EXERCISE, even the Exercise 1-3 which are ‘All tests passed’.

PLEASE HELP RESOLVE because else it is WASTE OF YOUR TIME just as mine to keep posting these messages with no one learning anything new.

Many thanks in advance!

TMosh · August 2, 2021, 11:23pm

Sorry, we can’t override the auto grader.
There is apparently some error in a different part of your code.
Passing the unit tests does not prove your code is perfect. The unit tests don’t catch every error.

whitehat · August 2, 2021, 11:46pm

Can you please try explicit call the argument out like this?

attn_output = self.mha(query=x, value=x, key=x, attention_mask=mask)

TMosh · August 3, 2021, 2:52am

@whitehat: I don’t think that’s strictly necessary, (x, x, x, mask) seems to work fine.
But it would not hurt to give it a try.

@rameshgopalan: I would not lean on the Transformer application that you mentioned. I have no idea how closely it applies to this course or to this set of tools.

TMosh · August 3, 2021, 4:19am

When I copy your code into my notebook, it works fine and says “All tests passed”.

When you modify the notebook, be sure that you save it and then re-run all of the cells. The notebooks use a lot of global variables, and the values can easily get out of sequence when you edit some of the cells.

If you want me to look at your entire notebook, please download it and send it to me in a private message.

TMosh · August 3, 2021, 4:36am

@whitehat
When I tested this - removing the “mask” argument from the call to self.mha(…) - I got an assert:
AssertionError: Wrong values when training=True

So I think the error was detected.

whitehat · August 3, 2021, 4:38am

Oh, right I got that assert too.

rameshgopalan · August 3, 2021, 2:28pm

TMosh, i already have tried training=training and other combinations, but any GRADER IS ONLY USEFUL IF IT ACCURATELY TELLS YOU WHAT YOU DID WRONG - so I am going to have to keep posting the error until the grader either tells me what part of the code is failing, especially if live mentor is not able to find anything wrong, or else override the autograder.
i realize it is all around waste of time, but it is what it is.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-00617004b1af> in <module>
      1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
     84     encoder_layer1 = target(4, 2, 8)
     85     tf.random.set_seed(10)
---> 86     encoded = encoder_layer1(q, True, np.array([[1, 0, 1]]))
     87 
     88     assert tf.is_tensor(encoded), "Wrong type. Output must be a tensor"

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1010         with autocast_variable.enable_auto_cast_variables(
   1011             self._compute_dtype_object):
-> 1012           outputs = call_fn(inputs, *args, **kwargs)
   1013 
   1014         if self._activity_regularizer:

<ipython-input-15-77db2b2e670d> in call(self, x, training, mask)
     39         # START CODE HERE
     40         # calculate self-attention using mha(~1 line). Dropout will be applied during training
---> 41         attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
     42         # apply layer normalization on sum of the input and the attention output to get the
     43         # output of the multi-head attention layer (~1 line)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1010         with autocast_variable.enable_auto_cast_variables(
   1011             self._compute_dtype_object):
-> 1012           outputs = call_fn(inputs, *args, **kwargs)
   1013 
   1014         if self._activity_regularizer:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in call(self, query, value, key, attention_mask, return_attention_scores, training)
    463     #   H = `size_per_head`
    464     # `query` = [B, T, N ,H]
--> 465     query = self._query_dense(query)
    466 
    467     # `key` = [B, S, N, H]

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
   1006       with ops.name_scope_v2(name_scope):
   1007         if not self.built:
-> 1008           self._maybe_build(inputs)
   1009 
   1010         with autocast_variable.enable_auto_cast_variables(

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2708         # operations.
   2709         with tf_utils.maybe_init_scope(self):
-> 2710           self.build(input_shapes)  # pylint:disable=not-callable
   2711       # We must set also ensure that the layer is marked as built, and the build
   2712       # shape is stored since user defined build functions may not be calling

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/einsum_dense.py in build(self, input_shape)
    152         constraint=self.kernel_constraint,
    153         dtype=self.dtype,
--> 154         trainable=True)
    155 
    156     if bias_shape is not None:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, use_resource, synchronization, aggregation, **kwargs)
    637         synchronization=synchronization,
    638         aggregation=aggregation,
--> 639         caching_device=caching_device)
    640     if regularizer is not None:
    641       # TODO(fchollet): in the future, this should be handled at the

/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    808         dtype=dtype,
    809         initializer=initializer,
--> 810         **kwargs_for_getter)
    811 
    812     # If we set an initializer and the variable processed it, tracking will not

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    127   # TODO(apassos,rohanj) figure out how to remove collections from here so we
    128   # can remove the V1.
--> 129   variable_shape = tensor_shape.TensorShape(shape)
    130   return tf_variables.VariableV1(
    131       initial_value=init_val,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, dims)
    756     """
    757     if isinstance(dims, (tuple, list)):  # Most common case.
--> 758       self._dims = [Dimension(d) for d in dims]
    759     elif dims is None:
    760       self._dims = None

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in <listcomp>(.0)
    756     """
    757     if isinstance(dims, (tuple, list)):  # Most common case.
--> 758       self._dims = [Dimension(d) for d in dims]
    759     elif dims is None:
    760       self._dims = None

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, value)
    204             TypeError("Dimension value must be integer or None or have "
    205                       "an __index__ method, got value '{0!r}' with type '{1!r}'"
--> 206                       .format(value, type(value))), None)
    207       if self._value < 0:
    208         raise ValueError("Dimension %d must be >= 0" % self._value)

/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

TypeError: Dimension value must be integer or None or have an __index__ method, got value '<__main__.EncoderLayer object at 0x7fc2ac0531d0>' with type '<class '__main__.EncoderLayer'>'

rameshgopalan · August 3, 2021, 3:24pm

Tom Mosher – I have sent my code to you in private message. I have followed every format suggestion you have rightly made.

As you say the code works fine on your version of Jupyter notebook, any tips on why it might not work on mine appreciated.

TMosh · August 3, 2021, 4:00pm

After looking at your notebook, once of the function arguments in the constructor for self.mha(…) had been incorrectly modified.

Restoring the correct self.mha(…) constructor code fixed the problem.

rameshgopalan · August 3, 2021, 4:29pm

The error has been there BEFORE I had made any modifications to the self.mha constructor, in any case I have tried various combinations of the same line of code, and still SAME ERROR keeps repeating.

If you have figured it out in your browser, then can you send me Cut-> Paste the same (non-graded) part of the code that you think I have in error, please, Thanks

rameshgopalan · August 3, 2021, 6:31pm

Actually I kept trying different variations (no skill involved;-( and it finally worked!, thanks TMosh for the tip on where to look though

TMosh · August 3, 2021, 8:31pm

The problem was that:

self.mha = MultiHeadAttention(num_heads=num_heads, …

…had been modified to read “num_heads = self”, which is incorrect.

Topic		Replies	Views
C5 W4 A1 Encoder Layer isn't working Sequence Models	1	954	July 30, 2021
C5_W4_A1_Transformer_Subclass_v1 - class DecoderLayer Sequence Models	4	998	August 23, 2021
Week 4 - Exercise 4 - Encoder Layer Sequence Models	5	740	June 11, 2023
Course_5_Encoder layer Sequence Models	2	785	August 25, 2021
#C4W2 - Exercise 4 Transformer error NLP with Attention Models week-2	7	301	February 20, 2024

C5_W4_A1 UNQ_C4 Encoder Layer Mask

Related topics