DeepLearning.AI

C5 W4 UNQ_C8 Transformer

Course Q&A Deep Learning Specialization Sequence Models

dtab September 30, 2021, 3:56am 1

I am getting an incompatible shapes error. Seems to be coming from mha1 in decoderLayers() .

I have all tests passed throughout the exercise so I have no idea what is causing the miss match. After several days struggling through this weeks assignment and scouring the forums here I guess ill add my voice to the seemingly overwhelming majority and say that this exercise is really poorly conceived. I realize checking for every mistake in each test function is difficult but so is learning a difficult topic when the feed back is untrustworthy. Also the little tips to read MHA keras documentation are really unhelpful as nowhere in the link is code structure anything like what we are being asked to execute. It also seems fairly ironic that in the lecture there are little quips about how this topic can be picked up in one lecture because of how well the previous weeks have gone, but it is clear that this is not the case for quite a few people. I would suggest that you all bite the bullet and break this one topic into multiple assignments / lectures. Yes its not as clean but getting into more detail in lecture and assignment I think will level out the learning curve which appears to skyrocket here compared to the rest of the specialization.

full error:

InvalidArgumentError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 Transformer_test(Transformer, create_look_ahead_mask, create_padding_mask)

~/work/W4A1/public_tests.py in Transformer_test(target, create_look_ahead_mask, create_padding_mask)
276 enc_padding_mask,
277 look_ahead_mask,
→ 278 dec_padding_mask
279 )
280

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
→ 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

in call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask)
56 # call self.decoder with the appropriate arguments to get the decoder output
57 # dec_output.shape == (batch_size, tar_seq_len, fully_connected_dim)
—> 58 dec_output, attention_weights = self.decoder(x, enc_output, training, look_ahead_mask, dec_padding_mask)
59 print(‘attention_weights.shape’,attention_weights.shape)
60 # pass decoder output through a linear layer and softmax (~2 lines)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
→ 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

in call(self, x, enc_output, training, look_ahead_mask, padding_mask)
68 # pass x and the encoder output through a stack of decoder layers and save the attention weights
69 # of block 1 and 2 (~1 line)
—> 70 x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)
71 print(‘xd71.shape’,x.shape)
72 print(‘look_ahead_mask_d71.shape’,look_ahead_mask.shape)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
→ 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

in call(self, x, enc_output, training, look_ahead_mask, padding_mask)
54 # calculate self-attention and return attention scores as attn_weights_block1.
55 # Dropout will be applied during training (~1 line).
—> 56 mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask, return_attention_scores=True) # (batch_size, target_seq_len, d_model)
57 print(‘xdl_54.shape’,x.shape)
58 # apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
→ 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in call(self, query, value, key, attention_mask, return_attention_scores, training)
472
473 attention_output, attention_scores = self._compute_attention(
→ 474 query, key, value, attention_mask, training)
475 attention_output = self._output_dense(attention_output)
476

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in _compute_attention(self, query, key, value, attention_mask, training)
436 query)
437
→ 438 attention_scores = self._masked_softmax(attention_scores, attention_mask)
439
440 # This is actually dropping out entire tokens to attend to, which might

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/multi_head_attention.py in _masked_softmax(self, attention_scores, attention_mask)
399 attention_mask = array_ops.expand_dims(
400 attention_mask, axis=mask_expansion_axes)
→ 401 return self._softmax(attention_scores, attention_mask)
402
403 def _compute_attention(self,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
→ 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/advanced_activations.py in call(self, inputs, mask)
326 # Since we are adding it to the raw scores before the softmax, this is
327 # effectively the same as removing these entirely.
→ 328 inputs += adder
329 if isinstance(self.axis, (tuple, list)):
330 if len(self.axis) > 1:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
1162 with ops.name_scope(None, op_name, [x, y]) as name:
1163 try:
→ 1164 return func(x, y, name=name)
1165 except (TypeError, ValueError) as e:
1166 # Even if dispatching the op failed, the RHS may be a tensor aware

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 “”“Call target, and fall back on dispatchers if there is a TypeError.”""
200 try:
→ 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in _add_dispatch(x, y, name)
1484 return gen_math_ops.add(x, y, name=name)
1485 else:
→ 1486 return gen_math_ops.add_v2(x, y, name=name)
1487
1488

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in add_v2(x, y, name)
470 return _result
471 except _core._NotOkStatusException as e:
→ 472 _ops.raise_from_not_ok_status(e, name)
473 except _core._FallbackException:
474 pass

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
6860 message = e.message + (" name: " + name if name is not None else “”)
6861 # pylint: disable=protected-access
→ 6862 six.raise_from(core._status_to_exception(e.code, message), None)
6863 # pylint: enable=protected-access
6864

/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: Incompatible shapes: [1,4,3,3] vs. [1,1,5,5] [Op:AddV2]

jonaslalin September 30, 2021, 8:38am 2

What is x in your case?

1 Like

dtab September 30, 2021, 4:18pm 3

Thanks for spotting that. Fixed.

2 Likes

Topic		Replies	Views	Activity
C5 W4 A1 exercise 7 Sequence Models coursera-platform	1	570	March 21, 2022
Course 5 Week 4 Exercise 8 - failed Transformer_test Sequence Models coursera-platform	1	770	August 2, 2021
C3 W3 Question duplicates: Issues in Questions 4, 5 - Classify, Predict NLP with Sequence Models week-3	3	315	March 3, 2024
Error in Transformer class sequence model Week 4 Sequence Models coursera-platform	1	682	August 26, 2021
Course 5, Week 4, Exercise 8 - Transformer Sequence Models coursera-platform	2	926	July 31, 2021