Hey all so on exercise 3 scaled_dot_product_attention, I keep getting the Assertion error “Wrong unmasked weights”
I’ve checked to make sure that the right dimension of k is used to make dk, make sure to normalize along the right softmax axis, mask addition should be fine, etc.
Have any other ideas for places to check for errors?
Hint 1:
When computing matmul_qk, pay attention to the right side of the matrix multiplication. tf.matmul has a boolean flag that can help you with this.
Hint 2:
Here’s the shape of k:
k – key shape == (…, seq_len_k, depth)
You want to consider the key dimension and not details such a batch size when calculating dk. Watching the lectures might help you understand this better.
Hi! I’m having a really similar result, and I can’t find my error. k.shape[-1] is the depth,which I am casting to float32 and using tf.sqrt to find the scaling term.
I can’t find the error with my matmul operation, setting t_b to True. Can I have another hint?