Encoder Layer Mask Purpose

Dear Mentor,

Why do we need to pass mask to the encoder layer class ?

Are you referring to Course 5 Week 4 Assignment 1?

Yes sir, why mask need to be pass to the encoder layer ?

I don’t think it is really necessary. Certainly the reference paper doesn’t use a mask during encoding.

Thank You sir.

Also one more doubt why are we passing Query = Value = Key = X ? Why we are not doing this query = W * X , value = W * X, key = W * X

I believe the weight product happens inside the Embedding layer.

The MHA documentation says that for self-attention, the input data X is used for Q, K, and V.