DLS 5 Week 4 Ex 4

Shalva · December 12, 2021, 4:53pm

I am not sure if anyone else got this problem, but I am having issues with the definition of the MHA function and the values it needs to get -
This is the definition -
self.mha = MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim, dropout=dropout_rate)
When I reviewed the documentation of “MultiHeadAttention” it stated that -

basically, the expected value of the first two arguments is an integer. I have tried to use the shape of X to call the function in many many ways and it’s always the same error.

In addition, after reading some of the threads on this exercise in discord the repeating comment is just use X for the function call but it just doesn’t work… is there any on who can shed some light on this function call ?

Thanks

TMosh · December 12, 2021, 6:13pm

The Keras documentation is extremely thin.
This later paragraph gives you the info you need:

This is an implementation of multi-headed attention as described in the paper “Attention is all you Need” (Vaswani et al., 2017). If query , key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key , and returns a fixed-width vector.

So you need to specify the query, key, and value arguments (all should be ‘x’), along with the mask.

Topic		Replies	Views
Course 5 Week4 Exercise 4 Sequence Models coursera-platform	7	1125	October 30, 2023
C5_w5_a1 unq_c4 Sequence Models coursera-platform	2	551	February 1, 2023
C5W4A1: Excercise: 4 EncoderLayer: How to Read the Tensor Flow Documentation for MultiHeadAttention Sequence Models coursera-platform	1	557	September 28, 2022
Programming Assignment: Transformers Architecture with TensorFlow encoderlayer Sequence Models week-module-4 , coursera-platform	2	410	January 23, 2024
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	741	August 4, 2021

DLS 5 Week 4 Ex 4

Related topics