DLS 5 Week 4 Ex 4

I am not sure if anyone else got this problem, but I am having issues with the definition of the MHA function and the values it needs to get -
This is the definition -
self.mha = MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim, dropout=dropout_rate)
When I reviewed the documentation of “MultiHeadAttention” it stated that -

basically, the expected value of the first two arguments is an integer. I have tried to use the shape of X to call the function in many many ways and it’s always the same error.

In addition, after reading some of the threads on this exercise in discord the repeating comment is just use X for the function call but it just doesn’t work… is there any on who can shed some light on this function call ?


The Keras documentation is extremely thin.
This later paragraph gives you the info you need:

This is an implementation of multi-headed attention as described in the paper “Attention is all you Need” (Vaswani et al., 2017). If query , key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key , and returns a fixed-width vector.

So you need to specify the query, key, and value arguments (all should be ‘x’), along with the mask.