C5W4: Question about padding mask comparation

I was working on the week 4’s first programming assignment and reached the part comparing the result of softmax with and without padding mask. And I notice that here the dimension of this two results are different, so I am wondering how I could understand this.

I might have figured it out. So basically each line of mask is applied to the same x, and create 3 different representation of x being masked by each line.

1 Like