Week 4 Scaled Dot Product Attention

I’m getting an error saying that my weights are incorrect. I don’t understand where this is coming from, as I’ve scaled the mask by -1e9 before adding it to scaled_attention_logits, according to the hint.

Did you subtract it from 1?

Yes, but I realized I didn’t enclose the expression in parentheses before scaling it. Thanks for the reply.

Hi, the problem may be that I am a Python novice, but any help to understand why the (not working!) code ( i tried k.shape[0] as well) :

{mentor edit: code removed}

gives ERROR:


—> 74 rank = x.shape.rank
75 if rank == 2:
76 output = nn.softmax(x)

AttributeError: ‘tuple’ object has no attribute ‘rank’

That’s supposed to be -1e9.

But the main problem is that you can’t use np.divide() there. Both of the arguments of the np.divide() function should be identical size vectors. But in this exercise, np.sqrt(dk) is a scalar.

So what happens is np.divide() tries to make that scalar into a vector, by “automatic broadcasting”. That’s an error-prone process, and doesn’t work correctly in this case.

The simple method here is to just use the regular math division operator /.

Thanks Tom Mosher, but may be I am still missing a syntax subtlety somewhere…

(Non-working code follows- pls redact if need)-------------

{mentor edit: code removed}

Do not use “axis = dk” in the softmax layer.

Thanks Tom Mosher, that worked.

I have since posted to Forum a question on C5_W4_A1_Exercise 4 Encoder Layer – possibly a syntax issue that this Python novice doesn’t get​:frowning:. Any tips appreciated.

If your question isn’t about the dot product attention function, it shouldn’t be added to this thread.

i have a shape error to compute this step
matmul_qk = tf…(q, k, transpose_b=True)
dk = np.shape(k)
scaled_attention_logits = matmul_qk / np.sqrt(dk)
please help ?