First off, I tried making a week1 tag, but it somehow didn’t seem possible?
I’m confused about the cross attention step in Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera
specifically, because the target is shifted to the right, it has a different dimension than the context/attention output. Just using target + context as inputs to the MHA layer, I get the error:
ValueError: Exception encountered when calling layer ‘cross_attention_15’ (type CrossAttention).
Inputs have incompatible shapes. Received shapes (15, 256) and (14, 256)
Call arguments received by layer ‘cross_attention_15’ (type CrossAttention):
• context=tf.Tensor(shape=(64, 14, 256), dtype=float32)
• target=tf.Tensor(shape=(64, 15, 256), dtype=float32)
I tried to remedy this by only taking target[:,1:,:], as that would skip the first token, which doesn’t give an error in the first quick check, but fails the unit test (error below):
ValueError: Exception encountered when calling layer ‘cross_attention_20’ (type CrossAttention).
Inputs have incompatible shapes. Received shapes (13, 256) and (14, 512)
Call arguments received by layer ‘cross_attention_20’ (type CrossAttention):
• context=tf.Tensor(shape=(64, 14, 512), dtype=float32)
• target=tf.Tensor(shape=(64, 14, 256), dtype=float32)
I’m super confused about how to handle this mismatch and feel like I’m overlooking something simple that I can’t find.
Thanks in advance for whatever support you can provide