In sequence models course, week 4, assignment , Exercise 3:
{moderator edit: code removed}
Here The unit tests passed although I am using squeeze(0). This is because mask.shape=(1,3,4), but scaled_attention_logits.shape = (3,4), so I had to drop the first dimension to be able to add (& broadcast) the two arrays.
Without squeeze, I get the following error:
ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)
Posting code in a public topic is discouraged and can get your account suspended. It’s okay to share stacktrace on a public post and send code to a mentor via direct message. Please clean up the post.
Since scaled_attention_logits should have compatible dimension with v for dot product, using squeeze on the mask makes the shape from (1, 3, 4) to (3, 4).