It would be helpful for adding an addition hints for the “mask”. It took me a while to notice that the mask would need to be passed in self.mha since it won’t give error if not.
Thanks for the suggestion. I’ll submit an issue for that.
HELP! - i have followed your syntax suggestions, also carefully reviewed the Transformer tutorial but even following format there does not help the error at below:
is it how mask is passed as previous post says? but I am following both your recommendation and the tutorial below: but to no avail !!! - 3 days i have been trying various combinations/permutations - this course effort shouldnt be so much about some subtlety of syntax !!
# START CODE HERE
# calculate self-attention using mha(~1 line). Dropout will be applied during training
attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
# apply layer normalization on sum of the input and the attention output to get the
# output of the multi-head attention layer (~1 line)
out1 = self.layernorm1(x + attn_output) # (batch_size, input_seq_len, fully_connected_dim)
# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = self.ffn(out1) # (batch_size, input_seq_len, fully_connected_dim)
# apply dropout layer to ffn output during training (~1 line)
ffn_output = self.dropout_ffn(ffn_output,training=training)
# apply layer normalization on sum of the output from multi-head attention and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = self.layernorm2(out1 + ffn_output) # (batch_size, input_seq_len, fully_connected_dim)
# END CODE HERE
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-00617004b1af> in <module>
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)
~/work/W4A1/public_tests.py in EncoderLayer_test(target)
84 encoder_layer1 = target(4, 2, 8)
85 tf.random.set_seed(10)
---> 86 encoded = encoder_layer1(q, True, np.array([[1, 0, 1]]))
87
88 assert tf.is_tensor(encoded), "Wrong type. Output must be a tensor"
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
-> 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:
<ipython-input-15-77db2b2e670d> in call(self, x, training, mask)
39 # START CODE HERE
40 # calculate self-attention using mha(~1 line). Dropout will be applied during training
---> 41 attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
42 # apply layer normalization on sum of the input and the attention output to get the
43 # output of the multi-head attention layer (~1 line)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
-> 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in call(self, query, value, key, attention_mask, return_attention_scores, training)
463 # H = `size_per_head`
464 # `query` = [B, T, N ,H]
--> 465 query = self._query_dense(query)
466
467 # `key` = [B, S, N, H]
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1006 with ops.name_scope_v2(name_scope):
1007 if not self.built:
-> 1008 self._maybe_build(inputs)
1009
1010 with autocast_variable.enable_auto_cast_variables(
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
2708 # operations.
2709 with tf_utils.maybe_init_scope(self):
-> 2710 self.build(input_shapes) # pylint:disable=not-callable
2711 # We must set also ensure that the layer is marked as built, and the build
2712 # shape is stored since user defined build functions may not be calling
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/einsum_dense.py in build(self, input_shape)
152 constraint=self.kernel_constraint,
153 dtype=self.dtype,
--> 154 trainable=True)
155
156 if bias_shape is not None:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, use_resource, synchronization, aggregation, **kwargs)
637 synchronization=synchronization,
638 aggregation=aggregation,
--> 639 caching_device=caching_device)
640 if regularizer is not None:
641 # TODO(fchollet): in the future, this should be handled at the
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
808 dtype=dtype,
809 initializer=initializer,
--> 810 **kwargs_for_getter)
811
812 # If we set an initializer and the variable processed it, tracking will not
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
127 # TODO(apassos,rohanj) figure out how to remove collections from here so we
128 # can remove the V1.
--> 129 variable_shape = tensor_shape.TensorShape(shape)
130 return tf_variables.VariableV1(
131 initial_value=init_val,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, dims)
756 """
757 if isinstance(dims, (tuple, list)): # Most common case.
--> 758 self._dims = [Dimension(d) for d in dims]
759 elif dims is None:
760 self._dims = None
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in <listcomp>(.0)
756 """
757 if isinstance(dims, (tuple, list)): # Most common case.
--> 758 self._dims = [Dimension(d) for d in dims]
759 elif dims is None:
760 self._dims = None
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, value)
204 TypeError("Dimension value must be integer or None or have "
205 "an __index__ method, got value '{0!r}' with type '{1!r}'"
--> 206 .format(value, type(value))), None)
207 if self._value < 0:
208 raise ValueError("Dimension %d must be >= 0" % self._value)
/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
TypeError: Dimension value must be integer or None or have an __index__ method, got value '<__main__.EncoderLayer object at 0x7f17205d0a90>' with type '<class '__main__.EncoderLayer'>'
@rameshgopalan
I do not see any problems with the code you posted.
Tom Mosher / Dear Mentor:
I PLEASE HELP review the code, and if no Errors that human can find, then override the autograder.
In particular, I have closely followed the Transformer application posted open source, and completed ALL exercises through Exercise 8, but because of this autograder issue with Exercise 4, it gives me NO CREDIT for ANY EXERCISE, even the Exercise 1-3 which are ‘All tests passed’.
PLEASE HELP RESOLVE because else it is WASTE OF YOUR TIME just as mine to keep posting these messages with no one learning anything new.
Many thanks in advance!
Sorry, we can’t override the auto grader.
There is apparently some error in a different part of your code.
Passing the unit tests does not prove your code is perfect. The unit tests don’t catch every error.
Can you please try explicit call the argument out like this?
attn_output = self.mha(query=x, value=x, key=x, attention_mask=mask)
@whitehat: I don’t think that’s strictly necessary, (x, x, x, mask) seems to work fine.
But it would not hurt to give it a try.
@rameshgopalan: I would not lean on the Transformer application that you mentioned. I have no idea how closely it applies to this course or to this set of tools.
When I copy your code into my notebook, it works fine and says “All tests passed”.
When you modify the notebook, be sure that you save it and then re-run all of the cells. The notebooks use a lot of global variables, and the values can easily get out of sequence when you edit some of the cells.
If you want me to look at your entire notebook, please download it and send it to me in a private message.
@whitehat
When I tested this - removing the “mask” argument from the call to self.mha(…) - I got an assert:
AssertionError: Wrong values when training=True
So I think the error was detected.
Oh, right I got that assert too.
TMosh, i already have tried training=training and other combinations, but any GRADER IS ONLY USEFUL IF IT ACCURATELY TELLS YOU WHAT YOU DID WRONG - so I am going to have to keep posting the error until the grader either tells me what part of the code is failing, especially if live mentor is not able to find anything wrong, or else override the autograder.
i realize it is all around waste of time, but it is what it is.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-00617004b1af> in <module>
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)
~/work/W4A1/public_tests.py in EncoderLayer_test(target)
84 encoder_layer1 = target(4, 2, 8)
85 tf.random.set_seed(10)
---> 86 encoded = encoder_layer1(q, True, np.array([[1, 0, 1]]))
87
88 assert tf.is_tensor(encoded), "Wrong type. Output must be a tensor"
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
-> 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:
<ipython-input-15-77db2b2e670d> in call(self, x, training, mask)
39 # START CODE HERE
40 # calculate self-attention using mha(~1 line). Dropout will be applied during training
---> 41 attn_output = self.mha(x, x, x,mask) # Self attention (batch_size, input_seq_len, fully_connected_dim)
42 # apply layer normalization on sum of the input and the attention output to get the
43 # output of the multi-head attention layer (~1 line)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
-> 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in call(self, query, value, key, attention_mask, return_attention_scores, training)
463 # H = `size_per_head`
464 # `query` = [B, T, N ,H]
--> 465 query = self._query_dense(query)
466
467 # `key` = [B, S, N, H]
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1006 with ops.name_scope_v2(name_scope):
1007 if not self.built:
-> 1008 self._maybe_build(inputs)
1009
1010 with autocast_variable.enable_auto_cast_variables(
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
2708 # operations.
2709 with tf_utils.maybe_init_scope(self):
-> 2710 self.build(input_shapes) # pylint:disable=not-callable
2711 # We must set also ensure that the layer is marked as built, and the build
2712 # shape is stored since user defined build functions may not be calling
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/einsum_dense.py in build(self, input_shape)
152 constraint=self.kernel_constraint,
153 dtype=self.dtype,
--> 154 trainable=True)
155
156 if bias_shape is not None:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, use_resource, synchronization, aggregation, **kwargs)
637 synchronization=synchronization,
638 aggregation=aggregation,
--> 639 caching_device=caching_device)
640 if regularizer is not None:
641 # TODO(fchollet): in the future, this should be handled at the
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
808 dtype=dtype,
809 initializer=initializer,
--> 810 **kwargs_for_getter)
811
812 # If we set an initializer and the variable processed it, tracking will not
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
127 # TODO(apassos,rohanj) figure out how to remove collections from here so we
128 # can remove the V1.
--> 129 variable_shape = tensor_shape.TensorShape(shape)
130 return tf_variables.VariableV1(
131 initial_value=init_val,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, dims)
756 """
757 if isinstance(dims, (tuple, list)): # Most common case.
--> 758 self._dims = [Dimension(d) for d in dims]
759 elif dims is None:
760 self._dims = None
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in <listcomp>(.0)
756 """
757 if isinstance(dims, (tuple, list)): # Most common case.
--> 758 self._dims = [Dimension(d) for d in dims]
759 elif dims is None:
760 self._dims = None
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py in __init__(self, value)
204 TypeError("Dimension value must be integer or None or have "
205 "an __index__ method, got value '{0!r}' with type '{1!r}'"
--> 206 .format(value, type(value))), None)
207 if self._value < 0:
208 raise ValueError("Dimension %d must be >= 0" % self._value)
/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
TypeError: Dimension value must be integer or None or have an __index__ method, got value '<__main__.EncoderLayer object at 0x7fc2ac0531d0>' with type '<class '__main__.EncoderLayer'>'
Tom Mosher – I have sent my code to you in private message. I have followed every format suggestion you have rightly made.
As you say the code works fine on your version of Jupyter notebook, any tips on why it might not work on mine appreciated.
After looking at your notebook, once of the function arguments in the constructor for self.mha(…) had been incorrectly modified.
Restoring the correct self.mha(…) constructor code fixed the problem.
The error has been there BEFORE I had made any modifications to the self.mha constructor, in any case I have tried various combinations of the same line of code, and still SAME ERROR keeps repeating.
If you have figured it out in your browser, then can you send me Cut-> Paste the same (non-graded) part of the code that you think I have in error, please, Thanks
Actually I kept trying different variations (no skill involved;-( and it finally worked!, thanks TMosh for the tip on where to look though
The problem was that:
self.mha = MultiHeadAttention(num_heads=num_heads, …
…had been modified to read “num_heads = self”, which is incorrect.