I have a tiny question regarding the `Dot`

layer in exercise 1 - `one_step_attention`

: why the order of 2 tensors has to be `[alphas, a]`

?

My understanding about this:

`a.shape = (10, 30, 64)`

`alphas.shape: (10, 30, 1)`

`context.shape: (10, 1, 64)`

so i think when we let `context = dotor([alphas, a])`

, the third axis in `alphas`

is broadcasted and multiplied with 64 values in the third axis of `a`

, and then all values in the second axis is summed up. that’s why we get shape `(10, 1, 64)`

for context.

if we let `context = dotor([a, alphas])`

, the shape of `context`

will be `(10, 64, 1)`

. what exact operations are done in the Dot layer to get this shape?