C5W4A1E3 Transformers Architecture with TensorFlow scaled_dot_product_attention-

Mohamed_Akram · September 17, 2023, 5:47pm

In sequence models course, week 4, assignment , Exercise 3:

{moderator edit: code removed}

Here The unit tests passed although I am using squeeze(0). This is because mask.shape=(1,3,4), but scaled_attention_logits.shape = (3,4), so I had to drop the first dimension to be able to add (& broadcast) the two arrays.

Without squeeze, I get the following error:

ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)

What’s wrong with my code?

balaji.ambresh · September 17, 2023, 6:30pm

Posting code in a public topic is discouraged and can get your account suspended. It’s okay to share stacktrace on a public post and send code to a mentor via direct message. Please clean up the post.

Here’s the community user guide to get started.

Since scaled_attention_logits should have compatible dimension with v for dot product, using squeeze on the mask makes the shape from (1, 3, 4) to (3, 4).

TMosh · September 17, 2023, 7:11pm

Squeeze is not necessary. It isn’t discussed in the instructions either.

The instructions mention using tf.matmul(). Not np.dot().

Mohamed_Akram · September 17, 2023, 7:38pm

Sorry about posting code, will not happen again.

Mohamed_Akram · September 17, 2023, 7:38pm

Thanks this solved the issue.

Topic		Replies	Views
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-module-4 , coursera-platform	3	268	February 5, 2024
C5 W4 A1 E3 - "ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)" Sequence Models coursera-platform	2	860	February 24, 2022
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-module-4 , coursera-platform	6	69	September 6, 2024
C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention Sequence Models coursera-platform	11	811	August 23, 2021
Course5_week4 Size of attention_weights Sequence Models coursera-platform	10	730	June 19, 2021

C5W4A1E3 Transformers Architecture with TensorFlow scaled_dot_product_attention-

Related topics