Key_dim Multi Head attention

Anbu · May 8, 2022, 3:13pm

Hi Mentor,

Can you please help to understand the below arguments actually where it plays role in the transformer architecture ?

key_dim → Size of each attention head for query and key.|

value_dim- >Size of each attention head for value.|

Jaime_Gonzalez · May 9, 2022, 9:44am

Let me take a look and I’ll get back to you

Meanwhile, I recommend you take a look at the original transformers paper, section 3.2, where keys and values are discussed to explain how an ‘attention’ function works

Paper: https://arxiv.org/pdf/1706.03762.pdf

Jaime_Gonzalez · May 9, 2022, 11:05am

Hi again @Anbu

Pondering over your question, my best answer as to how keys and values fit into a transformer, though ambiguous, is that given in the original transformers paper:

“An attention function can be described as mapping a query and a set of key-value pairs to an output,
where the query, keys, values, and output are all vectors. The output is computed as a weighted sum
of the values, where the weight assigned to each value is computed by a compatibility function of the
query with the corresponding key”

Have you completed the transformers programming assignment? This may help you fit things together in context

Otherwise, could you be more specific/detailed with your question? At the moment it seems too broad to answer. It may help to know where the question arose - what exercise were you doing? / What video lecture minute were you watching? / etc

Anbu · May 9, 2022, 5:02pm

Sir I check and get back. Can you please help on the below thread

community.deeplearning.ai/t/query-input-last-dimension/125879

Topic		Replies	Views
What does key_min do in tf.keras.layers.MultiHeadAttention? Sequence Models coursera-platform	2	652	August 12, 2022
C5 W4 Attention (Q,K,V) Sequence Models coursera-platform	1	527	January 19, 2023
C5W4 Transformers Assignment/MultiHeadAttention & Concern About Q,K and V dimensions Sequence Models coursera-platform	1	671	April 21, 2022
C5W4: dk in scaled dot product attention Sequence Models coursera-platform	1	891	June 28, 2021
Q about keras doc of tf.keras.layers.MultiHeadAttention Sequence Models coursera-platform	6	561	July 18, 2021

Key_dim Multi Head attention

Related topics