In the lecture videos on self attention we had:
In the multi-head attention video we had:
Are the weights W^{<Q>}, W^{<K>}, W^{<V>} and W_i^{<Q>}, W_i^{<K>}, W_i^{<V>} different sets of weights?
In the lecture videos on self attention we had:
In the multi-head attention video we had:
Are the weights W^{<Q>}, W^{<K>}, W^{<V>} and W_i^{<Q>}, W_i^{<K>}, W_i^{<V>} different sets of weights?
Yes they are different sets of weights.
So are the weights from self attention applied first to get q,k,v for each t, which are then input into multi-head attention?
Sorry, I don’t know. I’m not very expert in the attention method.
Answered in