C5W3 - Neural machine translation exercise: the Dot layer

I have a tiny question regarding the Dot layer in exercise 1 - one_step_attention: why the order of 2 tensors has to be [alphas, a]?

My understanding about this:
a.shape = (10, 30, 64)
alphas.shape: (10, 30, 1)
context.shape: (10, 1, 64)

so i think when we let context = dotor([alphas, a]), the third axis in alphas is broadcasted and multiplied with 64 values in the third axis of a, and then all values in the second axis is summed up. that’s why we get shape (10, 1, 64) for context.

if we let context = dotor([a, alphas]), the shape of context will be (10, 64, 1). what exact operations are done in the Dot layer to get this shape?

Hi realnoob,

Dot calls batch_dot where you can find the operations performed.