C5W4A1: Excercise: 4 EncoderLayer: How to Read the Tensor Flow Documentation for MultiHeadAttention

hien_quoc · September 28, 2022, 2:29pm

We’ll I usually learn from example code, and maybe I’m missing something from reading the documentation below. Searching through the forum all I see is people pointing to the link below but, its still not helping me understand how to form the syntax for query, value, key. Can someone explain how to read the documentation better?

I think it looks obvious as it shows me the arguments that I can put in the tf.keras.layers.MultiHeadAttention()

but I don’t see how “query, value, key” is used. I see it in the “call arguments” for query, value, key. but why isn’t it in the example?

tf.keras.layers.MultiHeadAttention(
num_heads,
key_dim,
value_dim=None,
dropout=0.0,
use_bias=True,
output_shape=None,
attention_axes=None,
kernel_initializer=‘glorot_uniform’,
bias_initializer=‘zeros’,
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)

What’s the differences between and ‘argument’ and a “call argument”?

So, i just guessed and stuffed the call argument in there anyway and it seems to pass syntax, but if someone can explain how to setup the arguments for multihead attention, that would help.

Here’s my attempt, hide it if it violates the terms.
attn_output = self.mha(attention_mask=mask, query=x, value=x, key=x)

TMosh · September 28, 2022, 3:08pm

The documentation for Keras is terrible.

Often you must also read the documentation for the parent class.

Topic		Replies	Views
DLS 5 Week 4 Ex 4 Sequence Models coursera-platform	1	553	December 12, 2021
Q about keras doc of tf.keras.layers.MultiHeadAttention Sequence Models coursera-platform	6	562	July 18, 2021
What does key_min do in tf.keras.layers.MultiHeadAttention? Sequence Models coursera-platform	2	670	August 12, 2022
C4W1_Assigment_Exercise 2 - CrossAttention NLP with Attention Models week-module-1	9	457	January 7, 2024
Programming Assignment: Transformers Architecture with TensorFlow encoderlayer Sequence Models week-module-4 , coursera-platform	2	428	January 23, 2024

C5W4A1: Excercise: 4 EncoderLayer: How to Read the Tensor Flow Documentation for MultiHeadAttention

Related topics