C4_W2_Assignment_UNQ_C3

utkarsh_shukla2 · November 22, 2022, 12:45pm

Why is the Mask Size suggested to be (1 x L_q x L_q) ? Isn’t mask supposed to be (1 x L_q x L_v)? Can someone please clarify this.

reinoudbosch · November 29, 2022, 12:59am

Hi utkarsh_shukla2,

Have another look at the video ‘Masked Self Attention’. At 2:04 you see that the mask gets added to the dot product of the query and the transposed key divided by the square root of the encoding dimension of the key. The dimension of (Q\cdot{K.T})/\sqrt{d_k} is L_q by L_q, so the mask should have the same shape.

Topic		Replies	Views
Ungraded lab: Attention - Issue with function DotProductAttention() NLP with Attention Models week-module-2	1	583	January 12, 2023
C4_W2 ungraded lab masking, regarding addition of an extra dimension while creating padding mask NLP with Attention Models course-related , week-module-2	1	187	May 2, 2024
W4 A1 UNQ_C3 assignment for Sequence models class Sequence Models week-module-4 , coursera-platform	18	355	July 9, 2024
Lab1 mask.shape != dots.shape NLP with Attention Models week-module-2	1	546	April 22, 2022
Video: NMT with Attention NLP with Attention Models week-module-1	1	610	May 20, 2022

C4_W2_Assignment_UNQ_C3

Related topics