C5W4A1: Excercise: 4 EncoderLayer: How to Read the Tensor Flow Documentation for MultiHeadAttention

We’ll I usually learn from example code, and maybe I’m missing something from reading the documentation below. Searching through the forum all I see is people pointing to the link below but, its still not helping me understand how to form the syntax for query, value, key. Can someone explain how to read the documentation better?

I think it looks obvious as it shows me the arguments that I can put in the tf.keras.layers.MultiHeadAttention()

but I don’t see how “query, value, key” is used. I see it in the “call arguments” for query, value, key. but why isn’t it in the example?

tf.keras.layers.MultiHeadAttention(
num_heads,
key_dim,
value_dim=None,
dropout=0.0,
use_bias=True,
output_shape=None,
attention_axes=None,
kernel_initializer=‘glorot_uniform’,
bias_initializer=‘zeros’,
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)

What’s the differences between and ‘argument’ and a “call argument”?

So, i just guessed and stuffed the call argument in there anyway and it seems to pass syntax, but if someone can explain how to setup the arguments for multihead attention, that would help.

Here’s my attempt, hide it if it violates the terms.
attn_output = self.mha(attention_mask=mask, query=x, value=x, key=x)

The documentation for Keras is terrible.

Often you must also read the documentation for the parent class.