W4 - scaled_dot_product_attention

Breno · June 15, 2023, 11:13pm

I’ve been trying to complete the function scaled_dot_product_attention(q, k, v, mask), but can’t get it right. I think the problem is somehow related with the value of dk, but I don’t know how to fix.

Here’s the error:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-67-00665b20febb> in <module>
      1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

[snippet removed by mentor]

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in sqrt(x, name)
   4901     A `tf.Tensor` of same size, type and sparsity as `x`.
   4902   """
-> 4903   return gen_math_ops.sqrt(x, name)
   4904 
   4905 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sqrt(x, name)
  10040     try:
  10041       return sqrt_eager_fallback(
> 10042           x, name=name, ctx=_ctx)
  10043     except _core._SymbolicException:
  10044       pass  # Add nodes to the TensorFlow graph.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sqrt_eager_fallback(x, name, ctx)
  10063   _attrs = ("T", _attr_T)
  10064   _result = _execute.execute(b"Sqrt", 1, inputs=_inputs_flat, attrs=_attrs,
> 10065                              ctx=ctx, name=name)
  10066   if _execute.must_record_gradient():
  10067     _execute.record_gradient(

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError: Value for attr 'T' of int32 is not in the list of allowed values: bfloat16, half, float, double, complex64, complex128
	; NodeDef: {{node Sqrt}}; Op<name=Sqrt; signature=x:T -> y:T; attr=T:type,allowed=[DT_BFLOAT16, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128]> [Op:Sqrt]

balaji.ambresh · June 16, 2023, 2:14am

According to the stacktrace (thanks for sharing), tf.sqrt expects a non-integer argument. Please convert the parameter to a float type scalar for proper execution.

Breno · June 16, 2023, 1:21pm

Indeed this was a problem. But now I’m gettin a shape error in tf.divide

The beginning of the code:

matmul_qk = tf.matmul(q, tf.transpose(k))
dk = k.shape
scaled_attention_logits = tf.divide(matmul_qk, tf.sqrt(tf.cast(dk, tf.float32)))

Error:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-21-00665b20febb> in <module>
      1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
     55     v = np.array([[0, 0], [1, 0], [1, 0], [1, 1]]).astype(np.float32)
     56 
---> 57     attention, weights = target(q, k, v, None)
     58     assert tf.is_tensor(weights), "Weights must be a tensor"
     59     assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"

<ipython-input-20-b159c4ee3675> in scaled_dot_product_attention(q, k, v, mask)
     25     # scale matmul_qk
     26     dk = k.shape
---> 27     scaled_attention_logits = tf.divide(matmul_qk, tf.sqrt(tf.cast(dk, tf.float32)))
     28 
     29     # add the mask to the scaled tensor.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in divide(x, y, name)
    467       dtype = y.dtype.base_dtype if tensor_util.is_tensor(y) else None
    468       x = ops.convert_to_tensor(x, dtype=dtype)
--> 469     return x / y
    470 
    471 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
   1162     with ops.name_scope(None, op_name, [x, y]) as name:
   1163       try:
-> 1164         return func(x, y, name=name)
   1165       except (TypeError, ValueError) as e:
   1166         # Even if dispatching the op failed, the RHS may be a tensor aware

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in truediv(x, y, name)
   1334     TypeError: If `x` and `y` have different dtypes.
   1335   """
-> 1336   return _truediv_python3(x, y, name)
   1337 
   1338 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in _truediv_python3(x, y, name)
   1273       x = cast(x, dtype)
   1274       y = cast(y, dtype)
-> 1275     return gen_math_ops.real_div(x, y, name=name)
   1276 
   1277 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in real_div(x, y, name)
   7327       return _result
   7328     except _core._NotOkStatusException as e:
-> 7329       _ops.raise_from_not_ok_status(e, name)
   7330     except _core._FallbackException:
   7331       pass

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6860   message = e.message + (" name: " + name if name is not None else "")
   6861   # pylint: disable=protected-access
-> 6862   six.raise_from(core._status_to_exception(e.code, message), None)
   6863   # pylint: enable=protected-access
   6864 

/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: Incompatible shapes: [3,4] vs. [2] [Op:RealDiv]

balaji.ambresh · June 16, 2023, 2:25pm

Please note that d_k should be a scalar and not a tuple. Use this hint to fix your code.

mrgransky · July 21, 2023, 11:58am

You may want to use tf.cast(x, dtype, name=None) to ensure you do not enter int32 into your calculation.

Topic		Replies	Views
C5_W4_A1-UNQ_C3 scaled_dot_product_attention() Sequence Models coursera-platform	1	1636	July 25, 2021
C5_W4_A1 scaled_dot_product_attention dimension error Sequence Models coursera-platform	2	1084	March 15, 2022
W4 Assignment 1 scaled_dot_product_attention error Sequence Models week-4 , coursera-platform	2	45	September 9, 2024
Course 5, Week 4 , Exercise 3 Sequence Models coursera-platform	2	1160	June 8, 2021
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3217	March 24, 2025

W4 - scaled_dot_product_attention

Related topics