Week 4: Scaled Dot Product Attention

Trying to do scaled_dot_product_attention_test(). I got this error, and I figured that the cause is my dk, which is set to be dk = k.shape[-2]. I got the code to work by doing a tf.cast() into a tf.float32, but this seems unnecessarily complicated so I’m wondering if I missed anything/not doing this the way it’s intended.

Also, since dk is supposed to be the dimension of the keys, that means it’s supposed to be ‘seq_len_k’, and not ‘depth’, right? I’m very confused since doing k.shape[-1] and k.shape[-2] didn’t change anything (still passed the test)

Personally, I used:
(dk, col) = np.shape(k)

I don’t like using negative index values, it just bugs me.

And your image doesn’t capture the entire stack trace, that’s usually helpful.

The reason why I used it was because I was under the impression that we don’t know how many elements is in the shape of k (since the hint said: key shape == (…, seq_len_k, depth) ). Wouldn’t (dk, col) = np.shape(k) only work under the assumption that calling .shape will return only 2 numbers?

correct me if I’m wrong, I’m very new to Python.

I’m terribly confused, so I tried to go back to my code of just dk = k.shape[-2] and running it so I can get the error again to update my picture to show the entire stack trace, but apparently now it worked completely without having to do tf.cast() to float?

Tried restarting kernel and clearing all output, yup still worked. Wow. I have absolutely no idea why it suddenly worked without tf.cast(), so if you have any insight that would be amazing.

I have no idea, sorry. There are too many potential variations to even contemplate.

Yes, you’re probably right.

no worries, thank you for your help!

Also a bit confused about what dk should be set to. I thought maybe the product of the dimensions of both final axes but again, not sure

Hi @beckyjeff ,

dk is the number of keys in the keys matrix, which is the number of elements in row. You can do np.shape(k)[0] to get that value.