W4 A1 | Ex-3 | Scaled Dot Product Attention

yes,
this error assert tf.is_tensor(attention), "Output must be a tensor"
using dk as dk= tf.cast(tf.shape(k)[0],tf.float32)
mask as mask tf.linalg.band_part(tf.ones((tf.shape(q)[0],tf.shape(k)[0])), -1, 0)

and in if condition scaled_attention_logits += ((1-mask)*(-1e9))
tf.keras.activations.softmax
output as tf.matmul
can you plz correct this

1 Like

same issue with me. Help me!!

Check that you multiply q by k.Transpose in the first step.

3 Likes

I ran into the same error, it turned out I didn’t transpose the matrix k. matmul_qk = tf.matmul(q, k.transpose())

3 Likes

I was having the same issues.
Finally changing from tf.nn.softmax to tf.keras.activations.softmax solved the issue.
Not sure what is the difference, but just in case it helps someone

2 Likes

I am getting a wrong shape error:
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],

AssertionError: Wrong shape. We expected (3, 4)

Since I can add the mask to scaled_attention_logits, I don’t think that is the wrong shape but since it is saying it should be q.shape[0],k.shape[1] maybe it is. I did transpose k for the first matrix multiplication. It doesn’t actually say where the problem is occurring in the function. Any ideas what I could be doing wrong?

This thread has been cold for four years. Posting here was a bold strategy. I see you also created a new thread, and the conversation is continuing there.

I’ll close this one.