Dear Mentor,
Why do we need to pass mask to the encoder layer class ?
Dear Mentor,
Why do we need to pass mask to the encoder layer class ?
Are you referring to Course 5 Week 4 Assignment 1?
Yes sir, why mask need to be pass to the encoder layer ?
I don’t think it is really necessary. Certainly the reference paper doesn’t use a mask during encoding.
Thank You sir.
Also one more doubt why are we passing Query = Value = Key = X ? Why we are not doing this query = W * X , value = W * X, key = W * X
I believe the weight product happens inside the Embedding layer.
The MHA documentation says that for self-attention, the input data X is used for Q, K, and V.