Mistake in W4A1 - 2.1 Masking example

James_Siddle · November 27, 2023, 8:47pm

I think I’ve spotted a mistake in the Section 2.1 of the Week 4 activity on Transformers:

If we multiply (1 - mask) by -1e9 and add it to the sample input sequences, the zeros are essentially set to negative infinity. Notice the difference when taking the softmax of the original sequence and the masked sequence:

The example produces masked outputs with different dimensionality which really confused me for a while. I believe the cause is that create_padding_mask() adds an extra dimension:

tf.Tensor(
[[7.2876644e-01 2.6809821e-01 6.6454901e-04 6.6454901e-04 1.8064314e-03]
 [8.4437378e-02 2.2952460e-01 6.2391251e-01 3.1062774e-02 3.1062774e-02]
 [4.8541026e-03 4.8541026e-03 4.8541026e-03 2.6502505e-01 7.2041273e-01]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[[7.2973627e-01 2.6845497e-01 0.0000000e+00 0.0000000e+00 1.8088354e-03]
  [2.4472848e-01 6.6524094e-01 0.0000000e+00 0.0000000e+00 9.0030573e-02]
  [6.6483547e-03 6.6483547e-03 0.0000000e+00 0.0000000e+00 9.8670328e-01]]

 [[7.3057163e-01 2.6876229e-01 6.6619506e-04 0.0000000e+00 0.0000000e+00]
  [9.0030573e-02 2.4472848e-01 6.6524094e-01 0.0000000e+00 0.0000000e+00]
  [3.3333334e-01 3.3333334e-01 3.3333334e-01 0.0000000e+00 0.0000000e+00]]

 [[0.0000000e+00 0.0000000e+00 0.0000000e+00 2.6894143e-01 7.3105860e-01]
  [0.0000000e+00 0.0000000e+00 0.0000000e+00 5.0000000e-01 5.0000000e-01]
  [0.0000000e+00 0.0000000e+00 0.0000000e+00 2.6894143e-01 7.3105860e-01]]], shape=(3, 3, 5), dtype=float32)

My workaround was to index into the result from create_padding_mask()

Corrected - I just reviewed the output again and my workaround is unhelpful.

Any thoughts appreciated on the cause of the change in dimensionality!

TMosh · November 27, 2023, 9:36pm

It’s not a mistake.

James_Siddle · November 28, 2023, 10:39am

I see, fair enough.

By way of feedback, it would be helpful if the exercise explained why the two displayed Tensor’s have different dimensionality.

I struggled to understand why the original sequence and masked sequences differed in more than just masking.

Topic		Replies	Views
C5w4 2.1 Padding mask Sequence Models week-4	9	288	March 9, 2024
C5_W4 Masking issue (?!) Sequence Models week-4	2	136	May 16, 2024
Padding mask dimmensions Sequence Models	5	504	May 4, 2023
C5W4: padding mask in transformer Sequence Models	2	558	March 15, 2025
Week 4 Transformer create_padding_mask function Sequence Models	1	544	September 22, 2021

Mistake in W4A1 - 2.1 Masking example

Related topics