Getting stuck with Cross_attention function.
I understand I cannot share much info but let me share at least the output I get versus what I should.
Honestly no idea what’s happening and if we can’t share any code here, how can get any feedback and make any progress? So far, quite disappointed by documentation and the videos
Tensor of contexts has shape: (64, 15, 256) Tensor of translations has shape: (64, 14, 256) Tensor of attention scores has shape: (64, 14, 256)
Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of translations has shape: (64, 15, 256)
Tensor of attention scores has shape: (64, 15, 256)
File /tf/w1_unittest.py:316, in test_decoder..g()
313 cases.append(t)
315 t = test_case()
→ 316 if not isinstance(decoder.attention, CrossAttention):
317 t.failed = True
318 t.msg = “Incorrect type of attention layer”
AttributeError: ‘Decoder’ object has no attribute ‘attention’
Welcome to our community, and I’m sorry to hear about the challenges you’re facing. Please be assured that our course mentors are dedicated to assisting you and will provide feedback on your queries, as one has responded to you already. Our community upholds the principles of effective learning and maintaining academic integrity, as outlined in our guidelines. In cases where mentors require a closer look at your code to offer more tailored assistance, they will reach out to you directly via private message.
As for Exercise 3, it indicates that your Decoder implementation is missing self.attention which you should have implemented in the __init__ part of code.
Between your LSTM “calls” (they are not actually “calls”, they are instances that are saved when you initialize the Decoder class) there is your “CrossAttention call” which you have to implement:
...
# The attention layer
self.attention = None(None)
...
In other words, you should have saved your CrossAttention “call” (instance) in the variable self.attention and what the unit test tells you is that it cannot find it.
query is what you’re looking for (like: “Hey, I have this translation up until this point, which token should come next?”)
key is what could be a match (like: “Hey, these are the tokens in the original sentence, let’s see which align best.”)
value is what meaning to carry for translation (like: “OK, let’s take the best aligned candidates, sum their meanings and pass it for further token prediction”).
So, in similar vein, “the query should be the translation and the value the encoded sentence to translate”.