Self-Attention formula

In the Self-Attention video, it i said that in the transformers architecture the self-attention of the embedding of the t-th word denoted by x^{<t>} is given by

A(q,K,V) = \sum_i \frac{\exp(q \cdot k^{<i>})}{\sum_j \exp(q \cdot k^{<j>})}v^{<i>}

Is the element q in the formula a vector such that q = q^{<t>}? And is A(q,K,V) a vector as well?

As far as I understand q^{<t>}, k^{<t>} and v^{<t>} are vectors, being linear transformations of the embedding x^{<t>}. Then the corresponding q = q^{<t>} is multiplied by each of the keys k^{<j>} so that q \cdot k is a scalar. After that, there’s a summation over the values v^{<i>} associated to each embedding each multiplied by the softmax, hence the resulting A(q,K,V) would be a vector of the same dimension of v^{<t>}, is that right?

Could you solve def scaled_dot_product_attention in the assignment and answer the question about shapes?