Week 4: Scaled Dot Product Attention

eugeniad · July 1, 2021, 3:38am

Trying to do scaled_dot_product_attention_test(). I got this error, and I figured that the cause is my dk, which is set to be dk = k.shape[-2]. I got the code to work by doing a tf.cast() into a tf.float32, but this seems unnecessarily complicated so I’m wondering if I missed anything/not doing this the way it’s intended.

Also, since dk is supposed to be the dimension of the keys, that means it’s supposed to be ‘seq_len_k’, and not ‘depth’, right? I’m very confused since doing k.shape[-1] and k.shape[-2] didn’t change anything (still passed the test)

TMosh · July 1, 2021, 4:00am

Personally, I used:
(dk, col) = np.shape(k)

I don’t like using negative index values, it just bugs me.

TMosh · July 1, 2021, 4:01am

And your image doesn’t capture the entire stack trace, that’s usually helpful.

eugeniad · July 1, 2021, 4:03am

The reason why I used it was because I was under the impression that we don’t know how many elements is in the shape of k (since the hint said: key shape == (…, seq_len_k, depth) ). Wouldn’t (dk, col) = np.shape(k) only work under the assumption that calling .shape will return only 2 numbers?

correct me if I’m wrong, I’m very new to Python.

eugeniad · July 1, 2021, 4:08am

I’m terribly confused, so I tried to go back to my code of just dk = k.shape[-2] and running it so I can get the error again to update my picture to show the entire stack trace, but apparently now it worked completely without having to do tf.cast() to float?

Tried restarting kernel and clearing all output, yup still worked. Wow. I have absolutely no idea why it suddenly worked without tf.cast(), so if you have any insight that would be amazing.

TMosh · July 1, 2021, 4:21am

I have no idea, sorry. There are too many potential variations to even contemplate.

TMosh · July 1, 2021, 4:21am

Yes, you’re probably right.

eugeniad · July 1, 2021, 4:27am

no worries, thank you for your help!

beckyjeff · April 12, 2022, 3:59pm

Also a bit confused about what dk should be set to. I thought maybe the product of the dimensions of both final axes but again, not sure

Kic · April 13, 2022, 9:01am

Hi @beckyjeff ,

dk is the number of keys in the keys matrix, which is the number of elements in row. You can do np.shape(k)[0] to get that value.

Topic		Replies	Views
Course 5 Week 4 scaled_dot_product_attention() Sequence Models	3	715	September 8, 2021
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4	6	67	September 6, 2024
C5_W4_A1 assignment Exercise 3 Sequence Models	5	424	February 8, 2024
Week 4 Scaled Dot Product Attention Sequence Models	10	804	October 31, 2021
C5W4: dk in scaled dot product attention Sequence Models	1	887	June 28, 2021

Week 4: Scaled Dot Product Attention

Related topics