I’m getting an error saying that my weights are incorrect. I don’t understand where this is coming from, as I’ve scaled the mask by -1e9 before adding it to scaled_attention_logits, according to the hint.
Did you subtract it from 1?
Yes, but I realized I didn’t enclose the expression in parentheses before scaling it. Thanks for the reply.
Hi, the problem may be that I am a Python novice, but any help to understand why the (not working!) code ( i tried k.shape[0] as well) :
{mentor edit: code removed}
gives ERROR:
…
—> 74 rank = x.shape.rank
75 if rank == 2:
76 output = nn.softmax(x)
AttributeError: ‘tuple’ object has no attribute ‘rank’
That’s supposed to be -1e9.
But the main problem is that you can’t use np.divide() there. Both of the arguments of the np.divide() function should be identical size vectors. But in this exercise, np.sqrt(dk) is a scalar.
So what happens is np.divide() tries to make that scalar into a vector, by “automatic broadcasting”. That’s an error-prone process, and doesn’t work correctly in this case.
The simple method here is to just use the regular math division operator /.
Thanks Tom Mosher, but may be I am still missing a syntax subtlety somewhere…
(Non-working code follows- pls redact if need)-------------
{mentor edit: code removed}
…
Do not use “axis = dk” in the softmax layer.
Thanks Tom Mosher, that worked.
I have since posted to Forum a question on C5_W4_A1_Exercise 4 Encoder Layer – possibly a syntax issue that this Python novice doesn’t get. Any tips appreciated.
If your question isn’t about the dot product attention function, it shouldn’t be added to this thread.
i have a shape error to compute this step
matmul_qk = tf…(q, k, transpose_b=True)
dk = np.shape(k)
scaled_attention_logits = matmul_qk / np.sqrt(dk)
please help ?