C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention

Reghu · August 23, 2021, 4:52am

Hello,
I’m having major problems with this assignment. Mainly because I’m not a tensorflow guru and the tensorlfow online documentation is one of the worst documentation I have seen in my 25 years as a software developer.

Currently I have these problems, this is my code:
mentor edit: code removed

Error:
assert tf.is_tensor(attention), “Output must be a tensor”

Please help. Also, could the assignment have a little more handholding where TF is concerned. Thank you and best regards.

TMosh · August 23, 2021, 5:14am

Starting with:
In “matmul_qk = …”, you need to transpose the ‘k’ matrix.
From the instructions:

TMosh · August 23, 2021, 5:17am

In “attention_weights = …”, I recommend you use tf.nn.softmax() on the scaled_attention_logits.

In “output = …”, I recommend you use tf.linalg.matmul(), with the appropriate axis argument.

Reghu · August 23, 2021, 5:21am

Tf.linalg.matmul has no axis argument

Reghu · August 23, 2021, 5:22am

Also, now when I do:
{mentor edit: code removed}

I get the error:
ipython-input-26-1d9a3bc7cf92> in scaled_dot_product_attention(q, k, v, mask)
31 if mask is not None: # Don’t replace this None
32 #scaled_attention_logits += (1 - mask) * -1.0e9
—> 33 scaled_attention_logits += mask
34
35 # softmax is normalized on the last axis (seq_len_k) so that the scores

ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)

TMosh · August 23, 2021, 5:33am

Tf.linalg.matmul has no axis argument

You are correct, my mistake.

TMosh · August 23, 2021, 5:34am

The line of code you commented out is correct.

Reghu · August 23, 2021, 5:37am

I uncommented the line of code but I still get the broadcast error.

Reghu · August 23, 2021, 5:38am

I am currently using this:

{mentor edit: code removed}

gckc123 · August 23, 2021, 5:50am

I was stuck in this part for a while - I use the tf.matmul() function and it works.

Reghu · August 23, 2021, 5:57am

Wow.
Thanks. That totally fixed it.

TMosh · August 23, 2021, 6:43am

Sorry, I missed catching the use of np.matmul().

Topic		Replies	Views
Course 5 Week 4 Exercise 3 Sequence Models week-module-4 , coursera-platform	6	42	March 4, 2025
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3223	March 24, 2025
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-module-4 , coursera-platform	3	268	February 5, 2024
Scales dot product attention Sequence Models coursera-platform	2	953	June 18, 2021
C5_W4_A1_Transformer_Subclass_v1 Unit 3 scaled_dot_product_attention Error Sequence Models coursera-platform	1	874	August 10, 2021

C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention

Related topics