I was working on C5W3’s first programming assignment and found a behavior of Dot layer that I don’t understand. E.g.
a.shape = (10, 30, 64)
b.shape = (10, 30, 1)
Dot(axes = 1)(b, a).shape = (10, 1, 64)
Dot(axes = 1)(a, b).shape = (10, 64, 1)

I am confused about why it is computed that way, per my understanding, b will be broadcasted and the result of the operation would be the same. Could anyone help me figure it out? Thanks a lot!

Thanks for your reply! I was working on C5W3’s programming assignment: Neural_machine_translation_with_attention_v4a. And for the function one_step_attention I found out that changing the order of the input to Keras’s Dot layer would change the dimension of the result. Sorry for the confusion!