C5W4A1E3 Transformers Architecture with TensorFlow scaled_dot_product_attention-

In sequence models course, week 4, assignment , Exercise 3:

{moderator edit: code removed}

Here The unit tests passed although I am using squeeze(0). This is because mask.shape=(1,3,4), but scaled_attention_logits.shape = (3,4), so I had to drop the first dimension to be able to add (& broadcast) the two arrays.

Without squeeze, I get the following error:

ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)

What’s wrong with my code?

Posting code in a public topic is discouraged and can get your account suspended. It’s okay to share stacktrace on a public post and send code to a mentor via direct message. Please clean up the post.

Here’s the community user guide to get started.

Since scaled_attention_logits should have compatible dimension with v for dot product, using squeeze on the mask makes the shape from (1, 3, 4) to (3, 4).

Squeeze is not necessary. It isn’t discussed in the instructions either.

The instructions mention using tf.matmul(). Not np.dot().

1 Like

Sorry about posting code, will not happen again.

Thanks this solved the issue.