W4 A1 | Ex-3 | Scaled Dot Product Attention

Aafaq_Altaf · July 5, 2021, 10:42am

yes,
this error assert tf.is_tensor(attention), "Output must be a tensor"
using dk as dk= tf.cast(tf.shape(k)[0],tf.float32)
mask as mask tf.linalg.band_part(tf.ones((tf.shape(q)[0],tf.shape(k)[0])), -1, 0)

and in if condition scaled_attention_logits += ((1-mask)*(-1e9))
tf.keras.activations.softmax
output as tf.matmul
can you plz correct this

Aafaq_Altaf · July 5, 2021, 11:19am

same issue with me. Help me!!

elsayed5454 · July 23, 2021, 4:55pm

Check that you multiply q by k.Transpose in the first step.

Sherry · August 18, 2021, 10:49pm

I ran into the same error, it turned out I didn’t transpose the matrix k. matmul_qk = tf.matmul(q, k.transpose())

egarciamartin · September 22, 2021, 6:00pm

I was having the same issues.
Finally changing from tf.nn.softmax to tf.keras.activations.softmax solved the issue.
Not sure what is the difference, but just in case it helps someone

ejwalter · March 24, 2025, 3:12pm

I am getting a wrong shape error:
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],

AssertionError: Wrong shape. We expected (3, 4)

Since I can add the mask to scaled_attention_logits, I don’t think that is the wrong shape but since it is saying it should be q.shape[0],k.shape[1] maybe it is. I did transpose k for the first matrix multiplication. It doesn’t actually say where the problem is occurring in the function. Any ideas what I could be doing wrong?

TMosh · March 24, 2025, 7:30pm

This thread has been cold for four years. Posting here was a bold strategy. I see you also created a new thread, and the conversation is continuing there.

I’ll close this one.

Topic		Replies	Views
C5_W4_A1 scaled_dot_product_attention mask issue Sequence Models	3	1011	November 15, 2022
Exercise 3 - scaled_dot_product_attention AssertionError Sequence Models	2	1016	November 3, 2021
C5 W4 A1: Wrong masked weights: scaled_dot_product_attention() Sequence Models	4	722	February 6, 2022
Scales dot product attention Sequence Models	2	952	June 18, 2021
C5_W4_A1_Transformer_Subclass_v1 Unit 3 scaled_dot_product_attention Error Sequence Models	1	874	August 10, 2021

W4 A1 | Ex-3 | Scaled Dot Product Attention

Related topics