Hi,
My code it is working but I think may be an approximation error, you can see the photo below:
I’ve checked the code and I really don’t know how to rewrite it in other way.
Thanks in advance
Hi,
My code it is working but I think may be an approximation error, you can see the photo below:
I’ve checked the code and I really don’t know how to rewrite it in other way.
Thanks in advance
Solved! Just forgot to add a -1 to the mask.
Happened to have the same problem.
scaled_attention_logits += (1-mask)*-1e9