W4 A1 number of heads vs

jakhon77 · December 28, 2024, 10:16pm

Can someone help me understand why changing the number of heads does not affect the output dimension? As far as I know, the lecture states that the heads are concatenated, which should impact the output dimension. Thank you in advance!

Deepti_Prasad · December 29, 2024, 4:24am

concatenation of head is to the weight, which if you see when head(i) is changed the attention weight is applied to all 3 multinhead attention, allowing the relative q, k and v value to be selected and getting the same output as the head(i) output.

Topic		Replies	Views
Clarification of definitions in transformer model Sequence Models	1	510	December 17, 2021
Transformers EncoderLayers, Multi-Head attention or Self-Attention? Sequence Models	1	1053	July 5, 2021
Question about multi-head attention Sequence Models	2	622	June 25, 2021
Self attention and redundancy NLP with Attention Models week-2	2	598	March 23, 2023
A question of Transformer Sequence Models	1	492	December 3, 2021

W4 A1 number of heads vs

Related topics