Hi,

What should be the size of the tensor attention_weights in the function scaled_dot_product_attention_test?

Should it be (3,1,14) or (3,1,3,4)?

Thanks, I am stuck!

Hi,

What should be the size of the tensor attention_weights in the function scaled_dot_product_attention_test?

Should it be (3,1,14) or (3,1,3,4)?

Thanks, I am stuck!

1 Like

Probably you should have posted this in Course 5, not Course 4.

I get (3, 4).

Yes, you are right, it should be course 5.

I think my problem comes from the function create_padding_mask which creates extra dimension with

return seq[:, tf.newaxis, tf.newaxis, :]

as the last line

I am not sure if I coded this or if it was there already…

Do you have this as well?

The create_padding_mask() function was provided in the notebook You did not have to modify it.

ok, that’s what I thought.

Then I don’t understand where the error for my scaled_dot_product_attention comes from.

It contains the right values but it seems like the shape of the tensor (3,1,3,2) is not correct.

attention_weights uses tf.keras.activations.softmax(…).

1 Like

When you get your code working, please edit your replies that contain your code and delete the code. That clears you with the course Honor Code.

1 Like

And I think axis should be dk, not -1. I’m not sure whether -1 works there in all cases.

But, if you do that, dk should not include the square root. So you’d need to modify your code for the scaled_attention_logits.

1 Like

Thanks.

But it still is not working.

Can you confirm that the shape of the output of scaled_dot_product_attention should be (3,1,3,2) for the test scaled_dot_product_attention_test ?

No.

I get (3,2) for the output shape.

My bad, I was using the mask created before instead of the mask given in that function. My bad.