Week 4 Scaled Dot Product Attention

lu14 · July 18, 2021, 7:45pm

I’m getting an error saying that my weights are incorrect. I don’t understand where this is coming from, as I’ve scaled the mask by -1e9 before adding it to scaled_attention_logits, according to the hint.

TMosh · July 18, 2021, 8:52pm

Did you subtract it from 1?

lu14 · July 18, 2021, 9:07pm

Yes, but I realized I didn’t enclose the expression in parentheses before scaling it. Thanks for the reply.

rameshgopalan · July 25, 2021, 4:13pm

Hi, the problem may be that I am a Python novice, but any help to understand why the (not working!) code ( i tried k.shape[0] as well) :

{mentor edit: code removed}

gives ERROR:

…
—> 74 rank = x.shape.rank
75 if rank == 2:
76 output = nn.softmax(x)

AttributeError: ‘tuple’ object has no attribute ‘rank’

TMosh · July 26, 2021, 5:05am

That’s supposed to be -1e9.

TMosh · July 26, 2021, 5:56am

But the main problem is that you can’t use np.divide() there. Both of the arguments of the np.divide() function should be identical size vectors. But in this exercise, np.sqrt(dk) is a scalar.

So what happens is np.divide() tries to make that scalar into a vector, by “automatic broadcasting”. That’s an error-prone process, and doesn’t work correctly in this case.

The simple method here is to just use the regular math division operator /.

rameshgopalan · July 26, 2021, 12:54pm

Thanks Tom Mosher, but may be I am still missing a syntax subtlety somewhere…

(Non-working code follows- pls redact if need)-------------

{mentor edit: code removed}

…

TMosh · July 26, 2021, 5:27pm

Do not use “axis = dk” in the softmax layer.

rameshgopalan · July 29, 2021, 7:43pm

Thanks Tom Mosher, that worked.

I have since posted to Forum a question on C5_W4_A1_Exercise 4 Encoder Layer – possibly a syntax issue that this Python novice doesn’t get. Any tips appreciated.

TMosh · July 29, 2021, 8:28pm

If your question isn’t about the dot product attention function, it shouldn’t be added to this thread.

MohamedSalah · October 31, 2021, 1:24pm

i have a shape error to compute this step
matmul_qk = tf…(q, k, transpose_b=True)
dk = np.shape(k)
scaled_attention_logits = matmul_qk / np.sqrt(dk)
please help ?

Topic		Replies	Views
W4 Assignment 1 scaled_dot_product_attention error Sequence Models week-module-4 , coursera-platform	2	53	September 9, 2024
Scales dot product attention Sequence Models coursera-platform	2	953	June 18, 2021
Course 5 Week 4 Exercise 3 Sequence Models week-module-4 , coursera-platform	6	42	March 4, 2025
C5 W4 A1: Scaled_dot_product_attention AttributeError: 'tuple' object has no attribute 'rank' Sequence Models coursera-platform	4	1524	March 30, 2024
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3224	March 24, 2025

Week 4 Scaled Dot Product Attention

gives ERROR:

Related topics