In the lecture videos on self attention we had:

In the multi-head attention video we had:

Are the weights W^{<Q>}, W^{<K>}, W^{<V>} and W_i^{<Q>}, W_i^{<K>}, W_i^{<V>} different sets of weights?

In the lecture videos on self attention we had:

In the multi-head attention video we had:

Are the weights W^{<Q>}, W^{<K>}, W^{<V>} and W_i^{<Q>}, W_i^{<K>}, W_i^{<V>} different sets of weights?

Yes they are different sets of weights.

1 Like

So are the weights from self attention applied first to get q,k,v for each t, which are then input into multi-head attention?

Sorry, I don’t know. I’m not very expert in the attention method.

Answered in